Investigate speedup of tapered guide

mtbc commented 2 years ago

The tapered guide currently has threads follow neutrons through their whole journey so the ones that do not make it far through the instrument leave that computation resource useless thereafter.

Could the computation be split by component, where later components consume fewer threads (or other resource) corresponding to how they have to handle fewer neutrons? Would that improve efficiency? But how to filter out the others?

@ckendrick points out that https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EXECUTION.html#group__CUDART__EXECUTION_1g504b94170f83285c71031be6d5d15f73 may be helpful.

Also, what other approaches might there be to efficiency improvement? For instance, could a multi-GPU deployment system be considered where they handle different parts of the instrument?

mtbc commented 2 years ago

Rather than speculatively refactoring our instrument construction and testing, may be possible to construct a more ad-hoc proof of principle test where we more manually create a component that's actually composite and splits threads unevenly so parts nearer the exit get fewer.

yxqd commented 2 years ago

@mtbc @ckendrick can we run profiler to check if memory is the bottleneck? For tapered guide, every threads need to use data in the array for the guide profile (width and height vs z along beam). Could that be a problem?

mcvine / acc

Investigate speedup of tapered guide #77