Right now, interpolation to a 3D grid creates a stack of 2D interpolated grids. Each 2D layer is calculated in parallel, but each layer is computed serially. This may be ok on the CPU depending on grid size/number of particles, but for the CUDA implementation, it means that data is transferred to/from the GPU with every layer.
Right now, interpolation to a 3D grid creates a stack of 2D interpolated grids. Each 2D layer is calculated in parallel, but each layer is computed serially. This may be ok on the CPU depending on grid size/number of particles, but for the CUDA implementation, it means that data is transferred to/from the GPU with every layer.