matthewholman / assist

ASSIST is a software package for ephemeris-quality integrations of test particles.
https://assist.readthedocs.io/
GNU General Public License v3.0
24 stars 10 forks source link

Using some struct instead of GM, x, y, z, vx, vy, vz, ax, ay, az #39

Closed hannorein closed 1 year ago

hannorein commented 1 year ago

A lot of function arguments currently include GM, x, y, z, vx, vy, vz, ax, ay, az. This makes some operations cumbersome, for example copying all of those variables requires 10 statements. If we instead use a struct, copying could be done in one statement. I also think it would make the code more readable.

Don't merge this in. It's just a quick test to see if this improves the speed. I found that it does speed things up by about 5%. There is probably more one can do. This is just a minimal example to get it to compile. Many lines of code have to be changed because the all_ephem function is used so much.

In this test I just use assist_cache_item because it has the right variables. But it might make sense to use reb_particle here. There are a few extra variables in that struct that are not used (r, lastcollision, ...). But I doubt that would affect performance much. Maybe we could even get rid of assist_cache_item and use reb_particle for that as well. Logically it would make sense since we're really doing operations on particles.

matthewholman commented 1 year ago

I completely agree with you on this. In fact, I think I might have created such a struct in an earlier stage. I recall having difficulty getting python to deal properly with pointers to structs, so I opted for arrays. I don't think there is a fundamental issue with structs in ctypes; I just had trouble.

hannorein commented 1 year ago

Ok. Then I'll play around with it for a bit to see if I can find a good solution.

hannorein commented 1 year ago

Still working on this... One thing I continue to run into: It matters so much on what computer I run these tests. On my laptop with an M1 chip, I can get a speedup of 2x (!) by simply changing the order of members in a struct. On x86, I get no speedup whatsoever. Clearly this has something to do with how the CPU handles caching/memory access. But it's so hard to predict.

I'll focus on the speed on an x86 machine but it's fascinating...

dmhernan commented 1 year ago

So x86 chip optimal code does not imply M1 optimal code. Is your strategy here to write also optimal M1 code? I assume optimizing on M1 never has negative effects on x86 chip speeds in these tests.

hannorein commented 1 year ago

@dmhernan I don't have a strategy!

@matthewholman No rush, but could you have a close look at this PR? I think it's more or less complete, but there are so many changes which makes me a bit uncomfortable. So I'd really like some feedback! To summarize:

There is definitely more cleanup to do. But I don't think anything "big" is left. So if we get this one out of the way, then we can focus on the smaller stuff, unit tests, python interface, examples...

matthewholman commented 1 year ago

The last_state_x, last_state_v, and last_state_a pointers don't seem to be a part of the assist_extras definition in assist.h. I'm not sure if I'm looking at the right file/version.

matthewholman commented 1 year ago

@hannorein, this all looks very good to me. Using a struct certainly reduces the number of lines of code in a few places, and it's clearer.

Changing the name of cache_item to assist_particle makes sense to me. Maybe we could use assist_particle rather than reb_particle to save a bit of overhead, but using reb_particle would probably be clearer to users who are already very familiar with REBOUND.

I definitely like the assist_jpl_calc and assist_spk_calc changes. We can expect future JPL DE version (development ephemeris, ie DE440/441, etc), so keeping those routines flexible makes a lot of sense.

I am happy to merge this, if you still think it's ready.

hannorein commented 1 year ago

Closing this for now. Too much changed in one go. We can always revisit it at a later time.