Open graytaylor0 opened 3 months ago
@graytaylor0 , Are you planning on working this?
@dlvenable I am not planning on working this right now
Encountered what I think to be this issue, would there be logs available in the CloudWatch logs to verify if I'm falling into this situation?
Is your feature request related to a problem? Please describe. As a user of s3 scan, I have a bucket with 100 million objects. The current s3 scan source is not able to handle this many objects, as it is bottlenecked by returning all objects as a list of partitions in the supplier, which can lead to out of memory errors. Additionally, if there are any failures in s3 scan supplier, no partitions will get created because all partitions are returned from the supplier before they are created in the coordination store.
Describe the solution you'd like I would like the PartitionSupplier functions to be able to pass partitions back to the source coordinator for creation. So as objects are found during a scan, instead of holding them all in memory, the call to create the partition would be made right after the object is found from scanning.
Describe alternatives you've considered (Optional) A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.