Closed hannorein closed 1 year ago
I completely agree with you on this. In fact, I think I might have created such a struct in an earlier stage. I recall having difficulty getting python to deal properly with pointers to structs, so I opted for arrays. I don't think there is a fundamental issue with structs in ctypes; I just had trouble.
Ok. Then I'll play around with it for a bit to see if I can find a good solution.
Still working on this... One thing I continue to run into: It matters so much on what computer I run these tests. On my laptop with an M1 chip, I can get a speedup of 2x (!) by simply changing the order of members in a struct. On x86, I get no speedup whatsoever. Clearly this has something to do with how the CPU handles caching/memory access. But it's so hard to predict.
I'll focus on the speed on an x86 machine but it's fascinating...
So x86 chip optimal code does not imply M1 optimal code. Is your strategy here to write also optimal M1 code? I assume optimizing on M1 never has negative effects on x86 chip speeds in these tests.
@dmhernan I don't have a strategy!
@matthewholman No rush, but could you have a close look at this PR? I think it's more or less complete, but there are so many changes which makes me a bit uncomfortable. So I'd really like some feedback! To summarize:
assist_cache_item
struct to pass around ephemeris data. Maybe we should rename that to assist_particle
. Just using reb_particle
would also work, but it has a few more members and leads to a little bit of overhead.planet_ephem
and last_ephem
function with their assist_jpl_calc
and assist_spk_calc
counterparts. Both planet_ephem
and last_ephem
didn't really add much functionality, they just acted as a bridge. I think this makes it easier to maintain. Especially since the GM constants and the various unit conversions are now in the assist_jpl_calc
and assist_spk_calc
functions where I think they belong. Beforehand, some unit conversions were on one side, but others on the other.There is definitely more cleanup to do. But I don't think anything "big" is left. So if we get this one out of the way, then we can focus on the smaller stuff, unit tests, python interface, examples...
The last_state_x, last_state_v, and last_state_a pointers don't seem to be a part of the assist_extras definition in assist.h. I'm not sure if I'm looking at the right file/version.
@hannorein, this all looks very good to me. Using a struct certainly reduces the number of lines of code in a few places, and it's clearer.
Changing the name of cache_item to assist_particle makes sense to me. Maybe we could use assist_particle rather than reb_particle to save a bit of overhead, but using reb_particle would probably be clearer to users who are already very familiar with REBOUND.
I definitely like the assist_jpl_calc and assist_spk_calc changes. We can expect future JPL DE version (development ephemeris, ie DE440/441, etc), so keeping those routines flexible makes a lot of sense.
I am happy to merge this, if you still think it's ready.
Closing this for now. Too much changed in one go. We can always revisit it at a later time.
A lot of function arguments currently include GM, x, y, z, vx, vy, vz, ax, ay, az. This makes some operations cumbersome, for example copying all of those variables requires 10 statements. If we instead use a struct, copying could be done in one statement. I also think it would make the code more readable.
Don't merge this in. It's just a quick test to see if this improves the speed. I found that it does speed things up by about 5%. There is probably more one can do. This is just a minimal example to get it to compile. Many lines of code have to be changed because the all_ephem function is used so much.
In this test I just use
assist_cache_item
because it has the right variables. But it might make sense to usereb_particle
here. There are a few extra variables in that struct that are not used (r
,lastcollision
, ...). But I doubt that would affect performance much. Maybe we could even get rid ofassist_cache_item
and usereb_particle
for that as well. Logically it would make sense since we're really doing operations on particles.