Comparing the shorter runs, the difference appears to be in the share conversion step. I suspect it has to do with the way that OrderingSender sequences communications (record n+1 is not accepted into the channel until after record n) somehow reducing the amount of validation work that can be done in parallel, but I haven't completely worked out how.
The change in #1350 to use record IDs to index proof batches caused a modest performance regression.
Comparing the shorter runs, the difference appears to be in the share conversion step. I suspect it has to do with the way that OrderingSender sequences communications (record n+1 is not accepted into the channel until after record n) somehow reducing the amount of validation work that can be done in parallel, but I haven't completely worked out how.
Originally posted by @andyleiserson in https://github.com/private-attribution/ipa/issues/1350#issuecomment-2427135447