Currently, for large number of small files, Skyplane is bottlenecks on the chunk dispatch because the file listing is much slower than Skyplane's ability to transfer the data
Proposed solution:
Use multiple processes/threads to list objects in parallel (this can be done by listing random prefixes)
Use aync HTTP requests to send chunk requests, rather than waiting
Have a look at S3P if you haven't already; it seems to have implemented a partial strategy for this that could be good to take inspiration from. This is a HUGE use case for us.
Currently, for large number of small files, Skyplane is bottlenecks on the chunk dispatch because the file listing is much slower than Skyplane's ability to transfer the data
Proposed solution: