petervandivier / PsAdxArchiver

Generic exporter from Azure Data Explorer to an external table in Azure blob storage.
MIT License
1 stars 0 forks source link

Better queue throughput #5

Open petervandivier opened 1 year ago

petervandivier commented 1 year ago

The current parallelization implementation means that (assuming no errors), each serial batch takes as long as the longest parallel sub-step. For example, if our $Step is 1 hour and our $Parallelism is 24, each serial batch is 1 day. If most hour sub-steps are quite small but there's regularly a spike from 12PM-1PM, then the long midday batch will cause all other parallel threads to wait until it has completed before the next serial batch can begin processing.

Implement a materialized queue system such that a thread receiving a batch can immediately proceed to another without waiting on a serial partner.

petervandivier commented 11 months ago

Relates to #10