vericast / conda-mirror

Mirror upstream conda channels
BSD 3-Clause "New" or "Revised" License
72 stars 60 forks source link

[Optimization] Shuffle package validation order before validating #49

Open ericdill opened 7 years ago

ericdill commented 7 years ago

As implemented, the concurrent package validation chunks the input list of packages to validate. This generally results in the package validation going a whole lot faster but also causes one executor to be stuck with a group of beefy packages to validate. The net result is a long tail at the end of the package validation where on executor is running a bunch of these slow-to-validate packages at the end. I think that shuffling the order (with random.shuffle) will distribute these beefy packages more reliably across all executors. Definitely a much smaller optimization than the implementation of concurrent package validation.