yeatmanlab / pyAFQ

Automated Fiber Quantification ... in Python
http://yeatmanlab.github.io/pyAFQ/
BSD 2-Clause "Simplified" License
56 stars 34 forks source link

Streamlines task in parallel using ray #1136

Closed asagilmore closed 4 months ago

asagilmore commented 4 months ago

This adds a num_chunks argument to tracking_params which causes the streamlines task to generate multiple trx files in parallel using ray, and then concatenate them at the end.

pep8speaks commented 4 months ago

Hello @asagilmore! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 467:53: W292 no newline at end of file

Comment last updated at 2024-05-24 20:00:56 UTC
asagilmore commented 4 months ago

To handle non-random seeding, where there may be less that 1 seed per chunk after splitting up the seeds for ray workers I am running gen_seeds outside of track function and then passing the seeds as an array here:

def _gen_seeds(n_seeds, params_file, seed_mask=None, seed_threshold=0,
               thresholds_as_percentages=False,
               random_seeds=False, rng_seed=None):
    if isinstance(params_file, str):
        params_img = nib.load(params_file)
    else:
        params_img = params_file

    affine = params_img.affine

    return gen_seeds(seed_mask, seed_threshold, n_seeds,
                     thresholds_as_percentages,
                     random_seeds, rng_seed, affine)

And then using that function here:

seeds = _gen_seeds(this_tracking_params['n_seeds'],
                   params_file, **this_tracking_params)
seed_chunks = np.array_split(seeds, num_chunks)
tracking_params_list = [this_tracking_params.copy() for _
                        in range(num_chunks)]
for i in range(num_chunks):
    tracking_params_list[i]['n_seeds'] = seed_chunks[i]

This works well, but this bypasses the default arguments from track(), which I have copied to the helper function. If those are ever changed the default arugments here would not be changed and it would exhibit unexpected behavior. Is there any way to make this always match the track() default arugments?

36000 commented 4 months ago

From my perspective this looks good now. You can add ray in setup.cfg (https://github.com/yeatmanlab/pyAFQ/blob/2889d7afc48ffc9d700cce73a6a7416c55721588/setup.cfg#L97) by adding this to the end:

ray =
    ray

all =
    %(dev)s
    %(fury)s
    %(fsl)s
    %(afqbrowser)s
    %(plot)s
    %(trx)s
    %(ray)s

Then you can also try making it so the plot_001_afq_api.py example (which already uses trx) does num_chunks, so that this implementation will also be automatically tested. then, I think we can merge.

arokem commented 4 months ago

+1

Have you had a chance to run some more experiments with this? Are you still consistently seeing pretty substantial speedup with this?

asagilmore commented 4 months ago

+1

Have you had a chance to run some more experiments with this? Are you still consistently seeing pretty substantial speedup with this?

yes, it shows over a 25 times speedup on a 72 core machine.

image

image

arokem commented 4 months ago

Nice! ⚡

I think that we can mark this ready to review and after you add it to the dependencies and documentation, we can go ahead and merge it. For now, we can keep this as an optional feature, but we might consider setting this as default on some future release (probably after we let users kick the tires on this some more).

asagilmore commented 4 months ago

Nice! ⚡

I think that we can mark this ready to review and after you add it to the dependencies and documentation, we can go ahead and merge it. For now, we can keep this as an optional feature, but we might consider setting this as default on some future release (probably after we let users kick the tires on this some more).

I think I have correctly added it to the documentation and dependencies, but please let me know if I have done it incorrectly I am not sure exactly where everything is supposed to be.

arokem commented 4 months ago

I've set this to "Ready for review" and removed the "WIP" from the title of the PR, because I think we're close.

36000 commented 4 months ago

this looks good, if the docs pass, we can merge!