As implemented, the concurrent package validation chunks the input list of packages to validate. This generally results in the package validation going a whole lot faster but also causes one executor to be stuck with a group of beefy packages to validate. The net result is a long tail at the end of the package validation where on executor is running a bunch of these slow-to-validate packages at the end. I think that shuffling the order (with random.shuffle) will distribute these beefy packages more reliably across all executors. Definitely a much smaller optimization than the implementation of concurrent package validation.
As implemented, the concurrent package validation chunks the input list of packages to validate. This generally results in the package validation going a whole lot faster but also causes one executor to be stuck with a group of beefy packages to validate. The net result is a long tail at the end of the package validation where on executor is running a bunch of these slow-to-validate packages at the end. I think that shuffling the order (with
random.shuffle
) will distribute these beefy packages more reliably across all executors. Definitely a much smaller optimization than the implementation of concurrent package validation.