hannorein / rebound

💫 An open-source multi-purpose N-body code.
https://rebound.readthedocs.io/
GNU General Public License v3.0
858 stars 220 forks source link

[TRACE+] Binary files size from simulation archive dependent on integrator #794

Closed gabrieltxg closed 2 weeks ago

gabrieltxg commented 2 months ago

Environment Which version of REBOUND are you using and on what operating system?

Describe the bug Actually I'm not sure if it is a bug, but just like the title, the final file size of the binaries saved depend on the integrator chosen.

I was benchmarking and trying to reproduce some benchmarks comparing IAS15, WHfast, MERCURIUS and TRACE (with different pericenter settings).

For IAS15 and TRACE+FULLIAS15 the final file size is almost 10 times bigger, while for WHfast it's around 30% bigger.

Same sampling and same system (with initial conditions slightly different).

Does it have something to do with with the number of internal timesteps?

To Reproduce I'm trying to reproduce Lu et al. (2024) papers, specifically the "violent systems" section, since i'm interested in compact planetary systems.

Some initial conditions are different but the problem persisted for different setups.

Here's my code:

import rebound
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import random
import time

def hill_radii_dist(ai, ms, mi, mj, delta):
    eta = ((mi + mj)/(3*ms))**(1/3)
    aj = ai*(2+delta*eta)/(2-delta*eta)
    return aj

def gaussian(mu, sigma):
    u1 = np.random.uniform(size=1)
    u2 = np.random.uniform(size=1)
    R_sq = -2 * np.log(u1) * sigma ** 2
    theta = 2 * np.pi * u2
    z1 = np.sqrt(abs(R_sq)) * np.cos(theta) + mu
    z2 = np.sqrt(abs(R_sq)) * np.sin(theta) + mu
    return z1

def simulation_delta(delta, n, e = 0, i = 0, sig = 0):
    # Planets semimajor axis are drawn from a graussian distribution with kmean and kstd as mean 
    # orbital separation parameter and spreading on this separation.
    kmean = delta #Initial orbtal separation between two adjacent planets, in terms of mutual hill radius
    kstd = sig #Standard deviation 

    sim = rebound.Simulation()
    msun = 1 #Msun
    m_earth = 1e-4 ## 10% of jupiter mass
    sim.add(m = msun)
    ai = 5 #Semimajor axis of the innermost planet
    m1= m_earth*np.random.uniform(3,9) # Mass is drawn from a uniform distribution
    #############3
    ecc = 0# coplanar, circular planets
    inc = 0#
    #### Initial angles are randomly distributed
    sim.add(m = m1, a = ai, e = ecc, 
            inc = np.deg2rad(inc), 
            omega = np.deg2rad(random.uniform(0, 360)),
            f = np.deg2rad(random.uniform(0, 360)),
            primary = sim.particles[0]
           )
    var_k = []
    ks = []
    mratio = []

    #### Bellow we add new planets following above configurations 
    #### and adding them with semimajor-axis drawn as defined above
    for i in range (1, n):
        #### Bellow we make sure adjacent planets angles are not aligned (Chambers et al., 1996)
        omega = random.uniform(0, 360)
        ff = random.uniform(0, 360)
        while abs(omega - np.rad2deg(sim.particles[i].omega))<20:
            omega = random.uniform(0, 360)            
        while abs(ff - np.rad2deg(sim.particles[i].f))<20:
            ff = random.uniform(0, 360)            

        delta15 = gaussian(kmean, kstd) # Draw the "real" initial orbital separatioon between planets ai and ai+1

        var_k.append(delta15)      

        mp = m_earth*np.random.uniform(3,9) #Draw new planets mass
        ai = sim.particles[i].a
        aj = hill_radii_dist(ai, msun, m1, mp, delta15) #Draw its semimajor axis from mutual hill radius 

        ecc = 0 # coplanar, circular planets
        inc = 0
        sim.add(m = mp, a = aj, e = ecc, 
                inc = np.deg2rad(inc), omega = np.deg2rad(omega), f = np.deg2rad(ff),
            primary = sim.particles[0])
    sim.move_to_com()    
    return sim

# n, delta, T, e, i, sig, pct, num = pars#[0]

# Function to integrate in parallel

def benchmark(pars):
    ### Receive parameters kmean, kstd, the integrator and pericenter parameters, as well as a cont.
    delta, sig, integrator, peri, speri, cont = pars
    delta = delta
    n = 3 #Number of planets
    e = 0
    i = 0
    sig = sig
    sim = simulation_delta(delta, n, e, i, sig) #Initialize simulation
    particulas = sim.particles
    m1 = particulas[1]
    P1 = m1.P #2pi

    sim.integrator = integrator

    sim.dt = 0.08*P1 #8% of innermost planet orbital period
    if integrator == 'trace':
        sim.ri_trace.peri_mode = peri
        sim.ri_trace.S_peri = speri
    else:
        peri = "none"
        speri = "none"

    #save files at this folder that is in the same directory of the notebook
    sim.save_to_file("test_integrators_n7.v5/archive_"+sim.integrator+"_"+peri+speri+"_"+str(cont)+"v1.bin", interval=10.,delete_file=True)

    start = time.time()

    E0 = sim.energy()
    sim.integrate(1e4*2*np.pi, exact_finish_time=0) #Integrating only for 1e4 years due to comp. limitation

    efinal = sim.energy()
    print("Integration finished with t = ", time.time()-start, "for integrator ", sim.integrator, peri, speri)
    tcomp = time.time()-start
    return [abs((E0-efinal)/E0), tcomp, sim.t, delta, sig, cont, sim.integrator, peri, speri]

### Creating array of initial conditions and parameters

deltas = [4, 6, 8] #integrating for initial orbital separation of 4, 6 and 8 mutual hill radius as mean
sigmas = [0, 2, 4] #value of standard deviation when drawing the semimajor axis of each new planet

integrators = ['trace', 'mercurius', 'trace', 'whfast', 'ias15']
peris = ['PARTIAL_BS', 'FULL_BS', 'FULL_IAS15']
speris = ["none", "default"]

specs = []

for ii in integrators:
    if ii == 'trace':
        for pps in peris:
            for ssps in speris:
                for dd in deltas:
                    for ss in sigmas:
                        specs.append([dd, ss, ii, pps, ssps])
    else:
        for dd in deltas:
            for ss in sigmas:
                specs.append([dd, ss, ii, None, None])

pars2 = []
specs = specs*10
for i, pp in enumerate(specs):
    a = pp.copy()
    a.append(i)
    pars2.append(a)

len(specs)
# specs

from multiprocess import Pool
start_time = time.time()

#integracao(pars2[0])

with Pool(10) as pool:
    results = pool.map(benchmark, pars2)
print('tempo total de integração: ', time.time()-start_time)

(Another fact that I would like to point is that I never manage to achieve the speedup mentioned in the paper introducing TRACE. While I do get an amazing boost compared to IAS15, general error and computation time is comparable with MERCURIUS, WHFast and FULL_BS. I don't know why, any light would help)

This code took me 1000 seconds to run for all 1350 parameters set, around 3 seconds per integration with TRACE.

The described bug happens when checking the resulting simulation archive binary file: they are essentially the same but when using WHFast or IAS15/FULL_IAS15 setting, they are much bigger!

What could be happening here? The final size of the files is important for me because I'm integrating a huge amout of conditions in the parameter space (and since I already mentioned, the speed up from TRACE would be very, very much welcome as well).

Is it a bug?

thank you very much!

Kind regards, Gabriel.

Additional context Adding more planet didn't change main conclusions.

Name/Affiliation It helps to know your name and your affiliation (if you have one).

hannorein commented 2 months ago

This is expected. Different integrators need different internal variables to restart a simulation and as a result the file size is different.

@tigerchenlu98 I think there is probably a way to optimize the file size for trace. We could move call the ias15_reset() and bs_reset() functions at the end of each step in addition to the beginning. This way, the IAS15/BS arrays would not be stored. It shouldn't affect reproducibility since we don't use these arrays in the next step.

gabrieltxg commented 2 months ago

Hi, Hanno! Thanks for the answer.

I was not sure if I should classify it as a bug or not, since it was very systematic and larger files were related to IAS15 and MERCURIUS runs. Thanks for the answer!

Regarding this optimization, i'm doing some longer runs and my archive files (using trace + PARTIAL_BS) are getting substantially bigger, even though the snapshot settings is the same as the above one (~ 50.000 snapshots).

For 1e4 years the size was around 5mb, but for 1e7 yrs the final files are on the order of 70/80mb. Is this expected as well? Maybe this optimization mentioned would come to hand.

hannorein commented 2 months ago

filesize = some constant number of particles number of snapshots

gabrieltxg commented 2 months ago

I imagined. I will try to make some sense of the constant factor, then. Thanks, Hanno! I will close (and relabel it)

dmhernan commented 2 months ago

Hi @gabrieltxg , regarding you speed comment--- for the violent systems in the paper, you'll note there's an expected (small) speedup with respect to BS. For a given problem with close encounters with some frequency, TRACE should always be significantly faster than MERCURIUS; the only reason it wouldn't be significantly faster is if TRACE and MERCURIUS are computing different orbits (the more likely scenario for violent systems is that MERCURIUS is getting it wrong). Compared to WHFast, TRACE should be roughly similar in speed, assuming, again, both codes are getting the orbits right (ignoring safemode settings and so forth). There's no problem for which TRACE is expected to be faster than WHFast.

dmhernan commented 2 months ago

Sorry, I missed your comment about error. If you're getting significantly better error with MERCURIUS compared to TRACE, that's never expected, at least if you're using as intended. For a violent system, the error in WHFast should be poorer than TRACE's, but if there's some hierarchy in masses, that error may not show up in the energy error.

gabrieltxg commented 2 months ago

Hi, David! thank you for your answer.

Sorry, I should have opened a new issue to deal with this instead of putting a tiny, bold line about it.

That's what I thought. WHFast errors are not substantially worse than TRACE, usually on the order of magnitude.

The masses are generally on the same order of magnitude as well.

Usually I'm taking 7 earth-mass planet separated by unities (between 6 and 14) of their mutual Hill Radius. The innermost planet has a1 = 0.1. For orbital separations larger than ~8Rhill, systems are usually stable, while for lower separations, they are exponentially more unstable.

On the very first time I tried TRACE i managed to get an amazing speed-up, but after I started to make the code a bit more complex it disappeared... What do you think could be wrong?

Apart from the snippet I showed above, I evaluatetd the integration and Nout output stepts, for each of them I checked a collision trigger (which, if True, would end the integration).

Typical integration time for this kind of system for 10^7 years ( 10^8.5 initial planet orbital periiods) is around 10^4 seconds, which is around 3h... I thought it might be too much.

Is there something that caught your eye?

Kind regards, Gabriel.

dmhernan commented 2 months ago

-I don't think it affects your tests too much, but one immediate comment is to ensure you have the latest TRACE version.
-I wouldn't be surprised if there is a small energy error difference between TRACE and WHFast in some iterations of violent systems, and yet WHFast is getting the wrong ultimate answers such as surviving planet fractions, despite this apparent "good" energy error in some cases. Now if you tell me WHFast and TRACE are getting similar statistics and speed for the violent system, that would be a big deal.
-It's hard to compare the speed of TRACE and MERCURIUS for a violent system since MERCURIUS probably leads to artificial ejections quickly, which saves compute time -Thus, nothing you've said actually rings alarm bells so far... your answers may still be correct. But @tigerchenlu98 may have more comments.

gabrieltxg commented 2 months ago

Hey, @dmhernan, I'm really sorry, I skipped your first comment... I just woke up.

Thank you very much for the analysis and explanations! Since for each run the angles are randomized, the initial conditions are not the very same for each test, so there should probably be some of the orbits difference.

I will update my trace version and see if something changes!, run some more tests.

Either way, thank you very much for the enlightenment. I will try to make some numerical sense of my code.

Kind regards, Gabriel.

tigerchenlu98 commented 2 months ago

Hi all,

Sorry for the slow response -- I've been away and just now getting back into things.

@gabrieltxg, I took a quick look through your simulation setup. Things look good to me for the most part, but one thing I might recommend is selecting a smaller timestep -- I think 8% of the original orbit may not give optimal results. I'd imagine in many of your systems you are ejecting some planets and ending up with some closer in than the original innermost orbit, and hence this timestep may not be appropriate anymore post-instability. TRACE should technically be able to handle these fine, but it's not optimal because we'll be switching back and forth from BS far more often, which will come with both a computation and speed penalty. The way I did it in Lu+ 2024 was to calculate the smallest possible dynamical timescale from conservation of energy (e.g, eject all the other planets, and keep one on a tight orbit). You'll end up with a far smaller timestep this way, but it's well worth it: we want to be able to stay in the Wisdom-Holman regime as much as possible, and slowing down the WHFAST part is more than compensated with less switches to BS/IAS15. If you're curious how I set up the system the C code should be linked in the data availability section of the paper!

I agree with @dmhernan on both the MERCURIUS and WHFAST points for the most part. I guess I'm a little surprised that WHFAST isn't completely failing in energy conservation? It seems like many of your systems end up not actually going unstable over the course of the integration, in which case WHFAST should be totally fine. But for the systems where there is an instability I'm surprised WHFAST doesn't show catastrophic error -- MERCURIUS certainly did, as seen in Lu+ 2024.

A quick note -- I would definitely recommend FULL_BS the default for this system setup. PARTIAL_BS seems unable to handle the extremely eccentric cases.

We're looking into the large simulation archive issue right now, hopefully we'll have a more optimized version soon.

hannorein commented 2 months ago

@gabrieltxg If possible, can you confirm that the latest version is giving you smaller binary files?

gabrieltxg commented 2 months ago

@tigerchenlu98 Sorry for taking too long, the week is a little wild here. thank you very much for the comments and answers! I will try tot update rebound and to run the simulation with smaller timesteps!

I didn't apply collision conditions nor ejections in this simulation specifically, I don't know if it will impact in something, but I will try to include those conditions! And I will surely look the C code available.

For my personal problem and code in which I'm applying TRACE, I didn't worry much about ejections and mergings: as soon as a close encounter/collision, I halt the integration. Then, I'm not very much worried about error growing arbitrarily. In this setup, simulations will only run as long as they are stable, so I'm not sure if I might be loosing too much using PARTITAL_BS instead of FULL_BS.

I'm appending here some integration information, there tau_comp is the total computation/integration time (in seconds), tau_inst is the time at which a close encounter was recorded (hence, integration was halted).

image

Things seem to be quite consistent and well behaved, but I'm wondering if integrations are not taking too long. Total integration time (at least for stable systems) is 10^7 years, which is ~ 10^(8.5) innermost planet orbital periods.

@hannorein Sure! I will run some tests and see if it works. Will update you soon.

gabrieltxg commented 2 months ago

@gabrieltxg If possible, can you confirm that the latest version is giving you smaller binary files?

Hi, @hannorein ,

Just to update you, I'm running the same script as I ran before I update rebound to 4.4.3 and I'm getting exactly the same results (using Trace with PARTIAL_BS):

For 7 planets integrated for 10^7 years (innermost planet has a1 = 0.1AU, P1 = 12 days, hence integrated for 10^8.5 P1), with 50.000 snapshots at the archive, the binary files have the exact very same filesize, that of 62.6MB.

Nothing changed.

I will run some benchmarkings like the one from my script to see what changes. What left me confused is that, for shorter integration times, regardless of the number of snapshots, the final filesize is substantially smaller, on the order of 10x. What could it be?

Marginally related to that, I'm running some post-integration analysis by opening the archives binaries and applying my tools of interest: opening the files is really fast, on the order of 0.1s, even less. But when I try to collect the orbital elements of each planets (i.e. to calculate the AMD), it takes on the order of 20-30s.

For 50.000 snapshots and 7 planets, it would be 350.000 iterations. Are those timescales reasonable?

Kind regards, Gabriel.

hannorein commented 2 months ago

I've checked myself and the changes do make a significant difference when using FULL_IAS15. There is not change with PARTIAL_BS. However, the output only consists of the particle data and some small meta data (time of the simulation, etc), so there is not much there to optimize.

One particle is 128 bytes. If you have 7 particles and 50000 snapshots it gives you 45 MB which is pretty close to your 62MB.