Closed mphoward closed 2 years ago
I confirmed that this analyzer produces the same results as one written in Python on stampede2. @astatt could you please review when you get a chance so I can merge?
If we every need the GPU component, I can add it and test it.
If we every need the GPU component, I can add it and test it.
Sounds good! Arash's student could also implement this if he wants to use this analyzer on their GPUs. I wrote it because of how slow the Python analyzer was on Stampede2, but they see similar behavior due to host-device copy bottleneck.
This PR implements a CPU compute to add up the velocity of a group of particles. This is useful in MPI simulations because it bypasses needing to pull a snapshot onto the root rank in python. It could be implemented on the GPU too using
thrust::transform_reduce
, but I don't have a GPU for testing and don't need it right now.