swincas / cookies-n-code

A repo for code review sessions at CAS
http://astronomy.swin.edu.au/
MIT License
30 stars 34 forks source link

Fast 2D Binning to Produce 3D Datacube #9

Open robbassett opened 6 years ago

robbassett commented 6 years ago

I have been working on some code to create IFU like observations from hydrosim data. What my code does is it takes in the positions, velocities, and masses of particles, does some rotations, then creates 2D maps of mass, velocity, and velocity dispersion. The final 2D binning step is performed by scipy.binned_statistic_2d and this can just provide me with a single value for each pixel.

The problem is that this loses a lot of kinematic information since the distributions are rarely (if ever) perfectly gaussian. E.g. when I get the velocity value for a pixel I just take the mean velocity of all the particles in that bin, but if the distribution is a double gaussian I would never know. In the end I would like to produce cubes, instead of 2D images, where the third dimension is a histogram of values. This could then be combined with emission line and/or stellar spectra to create IFU like datacubes. My own attempts to do this have been ridiculously slow.

-rob

manodeep commented 6 years ago

Do you have a link to the code?

Better yet, can you create a self-contained ipython notebook that demonstrates the problem?

robbassett commented 6 years ago

The code is here: https://github.com/swincas/Hydrobs

I've never made an ipython notebook, but I'll give it a go.

robbassett commented 6 years ago

Here is an iPython notebook illustrating the issue:

https://github.com/swincas/Hydrobs/blob/vector-rotation/3D_to_2D.ipynb

I've been thinking about this today, and it may be an issue of convolution in 3D. The binning to datacubes is not inherently that slow, but slows down significantly when you want to convolve with some PSF+spectral resolution (convolution in 3D).

manodeep commented 6 years ago

Can you profile the code with kernprof?

robbassett commented 6 years ago

This doesn't seem to be telling me much, because the current code has no loops. All the binning is done by scipy.binned_statistic_2d, and I guess it's not picking up on any loops there? Anyways, I have a test files called "cr_test.py". I ran kernprof on it then did:

python -m line_profiler cr_test.py.lprof Timer unit: 1e-06 s

And that is all I can see in the output.

manodeep commented 6 years ago

I thought you wanted to speed up the 2D binning of the code. First add @profile decorators to every function that you suspect takes up a lot of time, and then call your code like so kernprof -l cr_test.py <parameters for cr_test.py>. That should produce a line-by-line cpu time usage.

robbassett commented 6 years ago

The code I have posted is not slow, but it doesn't have the full functionality I want to employ which is creating 3D datacubes. I haven't found a fast module (e.g. in scipy) that can accomplish what I want, and my attempts in the past (which are not this code) were extremely slow. These old codes I do not have on hand, unfortunately.

manodeep commented 5 years ago

@robbassett Do you have any updates on this? And/or are you happy to share your progress/attempts?

robbassett commented 5 years ago

@manodeep I've done some work on producing more "realistic" velocity dispersions, but if anything it runs more slowly... I'm not confident there is really a way to speed it up much, but its honestly not so slow that its going to affect my ability to do science. I'm happy to talk about trying to get "realistic" velocity dispersions from particle data though, just not this week.

manodeep commented 5 years ago

@robbassett That will be great. Please do share your experience with the code. And if you can run the profiler, then at least we can take a look if the code can be sped up...