`grid` should have shuffle option that randomly orders settings

koaning / memo

Decorators that logs stats.

https://koaning.github.io/memo/getting-started.html

MIT License

103 stars 9 forks source link

`grid` should have shuffle option that randomly orders settings #21

Closed koaning closed 3 years ago

koaning commented 3 years ago

That way, the progress bar might yield something more reliable.

koaning commented 3 years ago

The progress bar should no longer have a progress parameter and this needs to be replaced with a shuffle parameter. The output should also be a list from now on to make it easier for tqdm/progress bars.

koaning commented 3 years ago

Note to self, the calmcode lessons should receive an update once this issue is adressed.

bradday4 commented 3 years ago

Do you see utility in parallelizing grid in conjunction with Runner? As it stands grid operates sequentially and therefore could be a bottleneck in running operations in parallel. Also if grid starts returning a list instead of a generator then eager evaluation could cause an even further delay before Runner starts executing.

In either case if the range of values fed to grid is large each worker process will need to wait for grid, whereas if grid were parallelized with Runner execution time could be sped up. I made a crude graphic to illustrate what I’m talking about. Grid

koaning commented 3 years ago

My main concern (feel free to tell me if I'm wrong) is that when you construct grids the compute time tends to be related to one of the attributes you pass along. That might cause the first 50% of the settings to be relatively fast to evaluate while the latter 50% is slow. That's why it occurs to me as preferable to randomly sort all the values beforehand. It's to prevent that one worker gets all the fast tasks and another one gets all the slow ones. A straightforward way to do that is to turn it into a list first before sorting it with the random.shuffle method.

I wonder. Is eager evaluation really a problem? I wonder if the settings lists are relatively small if this is something an end-user would ever really notice on modern hardware. I'm somewhat less familiar with how joblib handles this internally though so I'll gladly be convinced otherwise.

koaning commented 3 years ago

@bradday4 any concerns before I start working on the shuffled list?

bradday4 commented 3 years ago

@koaning No concerns on my end.

I wonder. Is eager evaluation really a problem? I wonder if the settings lists are relatively small if this is something an end-user would ever really notice on modern hardware.

Just an FYI I did do some testing by parallelizing grid in conjunction with func (right hand side of the picture I posted earlier) and all I managed to do was make execution time take longer. So ....

I thought I was being clever by splitting the work of itertools.product across multiple cores. I had the longest value in settings split evenly across cores so that product would work on small subsets. eg. Using 2 cores [{"key1":range(1,20)}, {"key2":range(1,10)}] would become 2 lists

[{"key1":range(1,10)}, {"key2":range(1,10)}], and [{"key1":range(10,20)}, {"key2":range(1,10)}]

then each list would go off to a separate process.