multiprocessing parallelization and deep copy gives unexpected results

augeorge commented 1 year ago

Hello,

I am trying to run tellurium / roadrunner simulations in parallel to do parameter sweeps/estimation but get unexpected results. I can sort of get around this by creating a wrapper function that loads a new roadrunner model before simulation, but this is slow if done for a large number of iterations. I noticed similar problems when doing copy.deepcopy(rr_model). Is there a best practice or other work-around for this issue?

Below is a simple example - the parallel pool gives incorrect results.

import numpy as np
import tellurium as te
import roadrunner
import copy
from sklearn.metrics import mean_squared_error as mse
import multiprocessing as mp
import sys
import os

print(f'tellurium version: {te.__version__}')
print(f'roadrunner version: {roadrunner.__version__}')
print(f'python version: {sys.version}')
print(f'OS version: {os.uname()}')

def f(k, y_true, rr_model):
    rr_model.resetToOrigin()
    rr_model.k1 = k[0]
    rr_model.km = k[1]
    y_pred = rr_model.simulate(0, 30, 200, ['time', 'Xo', 'S'])['S']
    return mse(y_true,y_pred)

def f_wrapper(k, y_true, rr_string):
    rr_model = te.loada(rr_string)
    return f(k, y_true, rr_model)

rr_string = '''
  $Xo -> S;   Xo/(km + S^h);
  S -> $w;  k1*S;

    # initialize
    h = 1;   # Hill coefficient
    k1 = 1;  km = 0.1;
    S = 1.5; Xo = 2

    at (time > 10): Xo = 5;
    at (time > 20): Xo = 2;
'''

rr_1 = te.loada(rr_string)
y_true = rr_1.simulate(0, 30, 200, ['time', 'Xo', 'S'])['S']
ks = [[0.1,0.01], [1, 0.1], [10,1]]

mse_serial = [f_wrapper(k, y_true, rr_string) for k in ks]
mse_serial2 = [f(k, y_true, rr_1) for k in ks]

p = mp.Pool(processes=2) 
mse_parallel = np.array(p.starmap(f_wrapper, [(k, y_true, rr_string) for k in ks]))
p.close()

p = mp.Pool(processes=2) 
mse_parallel2 = np.array(p.starmap(f, [(k, y_true, rr_1) for k in ks]))
p.close()

print(f'MSE - serial w/ wrapper: {mse_serial}')
print(f'MSE - serial w/o wrapper: {mse_serial2}')
print(f'MSE - parallel w/ wrapper: {mse_parallel}')
print(f'MSE - parallel w/o wrapper: {mse_parallel2}')

output:

roadrunner version: 2.2.1
python version: 3.7.14 (default, Sep  8 2022, 00:06:44) 
[GCC 7.5.0]
OS version: posix.uname_result(sysname='Linux', nodename='db9181e5dc3c', release='5.10.133+', version='#1 SMP Fri Aug 26 08:44:51 UTC 2022', machine='x86_64')
MSE - serial w/ wrapper: [12.267022835506769, 0.0, 2.0494559584927714]
MSE - serial w/o wrapper: [12.267022835506769, 0.0, 2.0494559584927714]
MSE - parallel w/ wrapper: [12.26702284  0.          2.04945596]
MSE - parallel w/o wrapper: [0. 0. 0.]

augeorge commented 1 year ago

A deep copy of the roadrunner object gives incorrect results even with serial calculations.

An example using deep copy w/ serial calculations.

import numpy as np
import tellurium as te
import roadrunner
import copy
from sklearn.metrics import mean_squared_error as mse
import multiprocessing as mp
import sys
import os

print(f'OS version: {os.uname()}')
print(f'tellurium version: {te.__version__}')
print(f'roadrunner version: {roadrunner.__version__}')

def f(k, y_true, rr_model):
    rr_model.resetToOrigin()
    rr_model.k1 = k[0]
    rr_model.km = k[1]
    y_pred = rr_model.simulate(0, 30, 200, ['time', 'Xo', 'S'])['S']
    return mse(y_true,y_pred)

def f_wrapper(k, y_true, rr_string):
    rr_model = te.loada(rr_string)
    return f(k, y_true, rr_model)

rr_string = '''
  $Xo -> S;   Xo/(km + S^h);
  S -> $w;  k1*S;

    # initialize
    h = 1;   # Hill coefficient
    k1 = 1;  km = 0.1;
    S = 1.5; Xo = 2

    at (time > 10): Xo = 5;
    at (time > 20): Xo = 2;
'''
rr_1 = te.loada(rr_string)

y_true = rr_1.simulate(0, 30, 200, ['time', 'Xo', 'S'])['S']
ks = [[0.1,0.01], [1, 0.1], [10,1]]

mse_serial = [f_wrapper(k, y_true, rr_string) for k in ks]
mse_serial2 = [f(k, y_true, rr_1) for k in ks]
mse_serial3 = [f(k, y_true, copy.deepcopy(rr_1)) for k in ks]

print(f'MSE - serial w/ wrapper: {mse_serial}')
print(f'MSE - serial w/o wrapper: {mse_serial2}')
print(f'MSE - serial w/o wrapper deep copy: {mse_serial3}')

output:

OS version: posix.uname_result(sysname='Linux', nodename='db9181e5dc3c', release='5.10.133+', version='#1 SMP Fri Aug 26 08:44:51 UTC 2022', machine='x86_64')
tellurium version: 2.2.3
roadrunner version: 2.2.1
MSE - serial w/ wrapper: [12.267022835506769, 0.0, 2.0494559584927714]
MSE - serial w/o wrapper: [12.267022835506769, 0.0, 2.0494559584927714]
MSE - serial w/o wrapper deep copy: [0.0, 0.0, 0.0]

luciansmith commented 1 year ago

Sorry for not getting to this sooner!

I'm not sure how deepcopy deals with Python objects that are wrappers for C objects from a shared library, but it looks like it's not working correctly here, at least. At any rate, I would recommend creating rr instances separately, and either just loading in the SBML file directly, or, to do this slightly faster, use the 'saveState' and 'loadState' (or saveStateS and loadStateS) functions to load in the state from the old roadrunner object to the new one.

We actually just published an article about libroadrunner, that includes bit about parallelization, including a new 'RoadRunnerMap' object that might be useful to you:

https://academic.oup.com/bioinformatics/article/39/1/btac770/6883908

You can see the Python scripts we used to create the figures (which use a couple different parallelization tricks) at

https://github.com/saurolab-papers/rr2.0_paper

So that will hopefully get you something that works for now.

However, I'm keeping this issue open because we should figure out how to make 'deepcopy' work, and I think your 'roadrunner without wrapper' should probably work, too. We'll investigate; thanks for posting! And apologies again for not following up sooner.

sys-bio / tellurium

multiprocessing parallelization and deep copy gives unexpected results #563