Further optimizations for the package

Saransh-cpp commented 3 months ago

cupy.hist has better scaling properties than cupy.bin_count and cupy.search_sort (go through the source code in cupy) - different in under and overflow treatment

Need UHI + optimisation (look into cupy's histogram) - same interface as hist required projections maybe (maybe not) on gpus but indexing does not make sense on gpus - filling on gpus is required, others don't have meaningful speed difference - UHI operations don't need to be on gpus - fill on gpu and UHI on cpu (normal hist interface)

when is it finally suitable to switch to gpus? binning should be big dense when switch to gpu. parts of the constructor should be on the cpu and other on gpu - optimisations - should see when and where it is allocating memory

jpivarski commented 3 months ago

Probably just the fill operation should be on the GPU. That's partly motivated by speed of filling, and partly by the fact that the data to fill the histogram is already there.

All of the UHI operations, like slicing and projecting, should probably be left to the CPU (i.e. make it an ordinary hist object). If a big dataset has already been reduced to a histogram, copying that small object off the GPU won't be a big transfer, and anyway, UHI operations are often interleaved with plotting, which happens on the CPU, too.

This still leaves plenty of work to do in CUDA, since there are multiple ways to fill a histogram, each appropriate for different sizes and shapes of histograms. Just filling is an interesting problem space.

Saransh-cpp commented 3 months ago

Thanks for the details!

scikit-hep / cuda-histogram

Further optimizations for the package #9