sys-bio / roadrunner

libRoadRunner: A high-performance SBML simulator
http://libroadrunner.org/
Other
36 stars 24 forks source link

Not possible to set seed for deterministic distrib model #1173

Open matthiaskoenig opened 6 months ago

matthiaskoenig commented 6 months ago

Hi all,

I have a distrib model I want to simulate. A parameter initial assignment is sampled via an uniform distribution. I have to set the seed for the model to be reproducible. Unfortunately this is not possible, i.e. the integrator settings for the deterministic simulators do not have a seed setting. There is only

(
    'relative_tolerance',
    'absolute_tolerance',
    'stiff',
    'maximum_bdf_order',
    'maximum_adams_order',
    'maximum_num_steps',
    'maximum_time_step',
    'minimum_time_step',
    'initial_time_step',
    'multiple_steps',
    'variable_step_size',
    'max_output_rows'
)

The seed can only be set for Gillespie? But a seed is required for a deterministic model with distrib information. How would I set the seed for roadrunner in this case?

Best Matthias

matthiaskoenig commented 6 months ago

I found the following in the documentation

from roadrunner import Config
Config.setValue(Config.RANDOM_SEED, 42)

But:

  1. this does not work for distrib which is clearly a bug, i.e. rerunning a model script 2 times gives different values of the random variable ! (see example below).
  2. this only sets things globally, but does not allow a model wise control. We have 10.000 roadrunner instances which all need an individual seed, because we cannot control the order of instantiation of the instances on high performance clusters. So there must be a way to set the seed on a per roadrunner.ExecutableModel instance.

Minimal model attached:

import roadrunner
import pandas as pd

model_path = "spt_random.xml"
r: roadrunner.ExecutableModel = roadrunner.RoadRunner(str(model_path))

# define subset of variables as selections
selections = [
    # time
    "time",  # [min] model time
    "protein_random",
]
r.selections = selections

from roadrunner import Config
Config.setValue(Config.RANDOM_SEED, 42)

s = r.simulate(start=0, end=10, steps=1)
df = pd.DataFrame(s, columns=s.colnames)
print(df)

Here the model spt_random.zip

Two times running the script gives:

   time  protein_random
0   0.0        0.004562
1  10.0        0.004562
   time  protein_random
0   0.0        0.962983
1  10.0        0.962983

This is a major issue and distrib models are currently not reproducible in roadrunner.

Please let me know if there is any workaround/hack in C++. We need this urgently for a publication which is about to be submitted.

hsauro commented 6 months ago

That’s a good point, it hadn’t occurred to me. This shouldn’t be difficult to implement.

Herbert

On Thu, Dec 21, 2023 at 6:18 AM Matthias König @.***> wrote:

I found the following in the documentation

from roadrunner import ConfigConfig.setValue(Config.RANDOM_SEED, 42)

But:

  1. this does not work for distrib which is clearly a bug, i.e. rerunning a model script 2 times gives different values of the random variable ! (see example below).
  2. this only sets things globally, but does not allow a model wise control. We have 10.000 roadrunner instances which all need an individual seed, because we cannot control the order of instantiation of the instances on high performance clusters. So there must be a way to set the seed on a per roadrunner.ExecutableModel instance.

Minimal model attached:

import roadrunnerimport pandas as pd model_path = "spt_random.xml"r: roadrunner.ExecutableModel = roadrunner.RoadRunner(str(model_path))

define subset of variables as selectionsselections = [

# time
"time",  # [min] model time
"protein_random",

]r.selections = selections

from roadrunner import ConfigConfig.setValue(Config.RANDOM_SEED, 42) s = r.simulate(start=0, end=10, steps=1)df = pd.DataFrame(s, columns=s.colnames)print(df)

Here the model spt_random.zip https://urldefense.com/v3/__https://github.com/sys-bio/roadrunner/files/13742744/spt_random.zip__;!!K-Hz7m0Vt54!jhw2r4x0nb1UgkaNE6yL2l4IB-eRQad0VtcwdCQ9vDZHi9Y0-opNllP1bGc2JoTx9N5n5OIhs4O2FZpt38c8ZB-7NpMCzg$

Two times running the script gives:

time protein_random 0 0.0 0.004562 1 10.0 0.004562

time protein_random 0 0.0 0.962983 1 10.0 0.962983

This is a major issue and distrib models are currently not reproducible in roadrunner.

Please let me know if there is any workaround/hack in C++. We need this urgently for a publication which is about to be submitted.

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/sys-bio/roadrunner/issues/1173*issuecomment-1866344385__;Iw!!K-Hz7m0Vt54!jhw2r4x0nb1UgkaNE6yL2l4IB-eRQad0VtcwdCQ9vDZHi9Y0-opNllP1bGc2JoTx9N5n5OIhs4O2FZpt38c8ZB9dF7BYvw$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AAIBSDS6TCPSMXQ63MIDG33YKRAKBAVCNFSM6AAAAABA6NHOV2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRWGM2DIMZYGU__;!!K-Hz7m0Vt54!jhw2r4x0nb1UgkaNE6yL2l4IB-eRQad0VtcwdCQ9vDZHi9Y0-opNllP1bGc2JoTx9N5n5OIhs4O2FZpt38c8ZB8aJVb38w$ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

matthiaskoenig commented 5 months ago

Hi all, we are using a lot of random components when coupling our SBML models to FEM models. Currently we can't use distrib and roadrunner for handling the random initializations because we can not reproduce the random part of the model.

image

See an example attached where the roadrunner microsimulations of the FEM geometry are randomly initialized (right pattern) or other patterns set (5 examples to the left). For debugging it would be very important to reproduce the microsimulations (via a seed) and also to reproduce the random patterns. E.g. we want to run parameter scans for the same random initializations of the roadrunner instances. Without being able to set a seed this is not possible.

It would be great if one could set a seed for a roadrunner instance/deterministic simulation which is used for sampling from distrib distributions.

Best Matthias

hsauro commented 5 months ago

I've a feeling being able to set the seed for distrib was on the list of things to do. Adel or Lucian, is that right?

H

On Wed, Jan 31, 2024 at 2:43 PM Matthias König @.***> wrote:

Hi all, we are using a lot of random components when coupling our SBML models to FEM models. Currently we can't use distrib and roadrunner for handling the random initializations because we can not reproduce the random part of the model.

image.png (view on web) https://urldefense.com/v3/__https://github.com/sys-bio/roadrunner/assets/900538/21deb53d-da69-4533-8526-db55b6bb0bcf__;!!K-Hz7m0Vt54!iocsQg5_eiecDxYW4pvSz2Co5sjQAmsQa4-G-NdXazjZOPnSWfo8ooXZe5HQqcrHEG0kYUaP8Iq1BiDOcVcxHdvxo8HDJg$

See an example attached where the roadrunner microsimulations of the FEM geometry are randomly initialized (right pattern) or other patterns set (5 examples to the left). For debugging it would be very important to reproduce the microsimulations (via a seed) and also to reproduce the random patterns. E.g. we want to run parameter scans for the same random initializations of the roadrunner instances. Without being able to set a seed this is not possible.

It would be great if one could set a seed for a roadrunner instance/deterministic simulation which is used for sampling from distrib distributions.

Best Matthias

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/sys-bio/roadrunner/issues/1173*issuecomment-1920110444__;Iw!!K-Hz7m0Vt54!iocsQg5_eiecDxYW4pvSz2Co5sjQAmsQa4-G-NdXazjZOPnSWfo8ooXZe5HQqcrHEG0kYUaP8Iq1BiDOcVcxHdvGwc_lpg$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AAIBSDQJEGBUWWQHBT7M3U3YRLCIXAVCNFSM6AAAAABA6NHOV2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRQGEYTANBUGQ__;!!K-Hz7m0Vt54!iocsQg5_eiecDxYW4pvSz2Co5sjQAmsQa4-G-NdXazjZOPnSWfo8ooXZe5HQqcrHEG0kYUaP8Iq1BiDOcVcxHdsT_cZxFA$ . You are receiving this because you commented.Message ID: @.***>

-- Herbert Sauro, Professor Director: NIH Center for model reproducibility University of Washington, Bioengineering 206-685-2119, www.sys-bio.org, http://reproduciblebiomodels.org/ Mobile: 206-880-8093 @.*** Books: http://books.analogmachine.org/

adelhpour commented 5 months ago

@hsauro I could find this seed-related issue in the tellurium repo, is that the the one you have in mind?

hsauro commented 5 months ago

No, that one is already implemented. It to do with the distrib package which allows users to draw from probability distributions during a deterministic simulation.

On Wed, Jan 31, 2024 at 3:24 PM Adel Heydarabadipour < @.***> wrote:

@hsauro https://urldefense.com/v3/__https://github.com/hsauro__;!!K-Hz7m0Vt54!nqL94tlBPCCLwt4gtuq4g1LCKtCTliQfsFd2BFEAOziWmbG4KLRt2STP0Nk_LC-En0_fJ3CGBGVUk111nc5G4bRJokxceQ$ I could find this https://urldefense.com/v3/__https://github.com/sys-bio/tellurium/issues/543__;!!K-Hz7m0Vt54!nqL94tlBPCCLwt4gtuq4g1LCKtCTliQfsFd2BFEAOziWmbG4KLRt2STP0Nk_LC-En0_fJ3CGBGVUk111nc5G4bSVgZQVdA$ seed-related issue in the tellurium repo, is that the the one you have in mind?

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/sys-bio/roadrunner/issues/1173*issuecomment-1920165760__;Iw!!K-Hz7m0Vt54!nqL94tlBPCCLwt4gtuq4g1LCKtCTliQfsFd2BFEAOziWmbG4KLRt2STP0Nk_LC-En0_fJ3CGBGVUk111nc5G4bS9ZryoNA$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AAIBSDUQBMR4GTPFSJTIHSTYRLHBLAVCNFSM6AAAAABA6NHOV2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRQGE3DKNZWGA__;!!K-Hz7m0Vt54!nqL94tlBPCCLwt4gtuq4g1LCKtCTliQfsFd2BFEAOziWmbG4KLRt2STP0Nk_LC-En0_fJ3CGBGVUk111nc5G4bTW7o3wFQ$ . You are receiving this because you were mentioned.Message ID: @.***>

-- Herbert Sauro, Professor Director: NIH Center for model reproducibility University of Washington, Bioengineering 206-685-2119, www.sys-bio.org, http://reproduciblebiomodels.org/ Mobile: 206-880-8093 @.*** Books: http://books.analogmachine.org/

matthiaskoenig commented 5 months ago

@adelhpour and @luciansmith I just went through my open issues. This is the only one which I need a solution for urgently (and cannot think of any workaround). It does not seem to be complicated to implement. Just requires the option for a seed which is set and used for the deterministic simulation. It would be great if this could be in the 2.6.0 release. If not then not.

luciansmith commented 5 months ago

Thank you for checking your issues! And indeed, @adelhpour is working on this, and we hope to incorporate it into the 2.6 release.

luciansmith commented 3 months ago

This is now implemented in the today-released 2.6.0! Do let us know if you have any issues.

matthiaskoenig commented 3 months ago

Hi Lucian, thanks. How does this work? I tried the following, but it does not give the same random value for the distrib model? How do I set the seed, so I get the same value for the random protein in 2 simulations?

import roadrunner
import pandas as pd

model_path = "spt_random.xml"
r: roadrunner.RoadRunner = roadrunner.RoadRunner(str(model_path))
m: roadrunner.ExecutableModel = r.getModel()

# define subset of variables as selections
selections = [
    # time
    "time",  # [min] model time
    "protein_random",
]
r.selections = selections
m.setRandomSeed(42)
s = r.simulate(start=0, end=10, steps=1)
df = pd.DataFrame(s, columns=s.colnames)
print(df)

r.resetToOrigin()

m.setRandomSeed(42)
s = r.simulate(start=0, end=10, steps=1)
df = pd.DataFrame(s, columns=s.colnames)
print(df)

Gives 2 different random values:

   time  protein_random
0   0.0         0.15245
1  10.0         0.15245
   time  protein_random
0   0.0        0.611653
1  10.0        0.611653
luciansmith commented 3 months ago

Ah--it's not the ExecutableModel, it's the roadrunner object itself:

import roadrunner
import tellurium as te
import pandas as pd

model_path = "spt_random.xml"
r: roadrunner.RoadRunner = te.loada("""
    species a
    a = normal(4, 4)
""")

r.setSeed(42)
s = r.simulate(start=0, end=10, steps=1)
df = pd.DataFrame(s, columns=s.colnames)
print(df)

r.resetToOrigin()

r.setSeed(42)
s = r.simulate(start=0, end=10, steps=1)
df = pd.DataFrame(s, columns=s.colnames)
print(df)

However, I don't think we noticed the m.setRandomSeed function (sigh) so didn't change it like the others; I would think that it should behave the same way.

adelhpour commented 3 months ago

We actually have noticed 'setRandomSeed' function, and it is used when the user calls the 'setSeed' function of roadrunner object with a false value for 'resetModel' flag. But I did not expect the user has direct access to setRandomSeed function. I will check how it led to this issue.

matthiaskoenig commented 3 months ago

Hi all, it works with the r.setSeed function. Unfortunately this has as bug sideeffect the resetting of the selections. I.e. calling r.setSeed resets the selections of the model!

import roadrunner
import pandas as pd

model_path = "spt_random.xml"
r: roadrunner.RoadRunner = roadrunner.RoadRunner(str(model_path))
m: roadrunner.ExecutableModel = r.getModel()

# define subset of variables as selections
selections = [
    # time
    "time",  # [min] model time
    "protein_random",
]
r.selections = selections

r.setSeed(42)
# r.selections = selections
# m.setRandomSeed(42)
s = r.simulate(start=0, end=10, steps=1)
df = pd.DataFrame(s, columns=s.colnames)
print(df)

r.resetToOrigin()

r.setSeed(42)
# r.selections = selections
# m.setRandomSeed(42)
s = r.simulate(start=0, end=10, steps=1)
df = pd.DataFrame(s, columns=s.colnames)
print(df)

prints

   time
0   0.0
1  10.0
   time
0   0.0
1  10.0

but should print

   time  protein_random
0   0.0        0.013265
1  10.0        0.013265
   time  protein_random
0   0.0        0.013265
1  10.0        0.013265

I.e. one has to manually reset the selections after the seed has been set !?

luciansmith commented 3 months ago

Happily, that was anticipated! We assumed that normally, when you set the seed you'd want to reset the model. But if not, there's a second argument to 'setSeed' that says whether or not the model should be reset, so all you have to do is use that flag:

import roadrunner
import pandas as pd

model_path = "spt_random.xml"
r: roadrunner.RoadRunner = roadrunner.RoadRunner(str(model_path))
m: roadrunner.ExecutableModel = r.getModel()

# define subset of variables as selections
selections = [
    # time
    "time",  # [min] model time
    "protein_random",
]
r.selections = selections

r.setSeed(42, resetModel=False)
# r.selections = selections
# m.setRandomSeed(42)
s = r.simulate(start=0, end=10, steps=1)
df = pd.DataFrame(s, columns=s.colnames)
print(df)

r.resetToOrigin()

r.setSeed(42, resetModel=False)
# r.selections = selections
# m.setRandomSeed(42)
s = r.simulate(start=0, end=10, steps=1)
df = pd.DataFrame(s, columns=s.colnames)
print(df)

(or just 'r.setSeed(42, False)')