Closed TorkelE closed 11 months ago
This looks like a good first proposal, and I think this integration would be great for allowing smooth parameter estimation in Julia. First. some general comments first (see comments on the struct):
# I think this a good structure (might be worth to call it CatalystPEtabModel at technically it is not a full
# PEtab model as it does not follow the full PEtab format). Would also allow me to then apply an extra
# parsing step before creating ODE-problem, so I could have
# createPEtabODEProblem(model::CatalystPEtabModel, ...)
struct CatalystPEtabModel
system::ReactionSystem
experimental_conditions::Vector{PEtabExperimentalCondition}
observables::Vector{PEtabObservable}
meassurments::Vector{PEtabMeassurment}
parameters::Vector{PEtabParameter}
end
#=
Looks good - but each experimental condition should have a string-id (or similar). Similar for observable
id
=#
struct PEtabExperimentalCondition
parameter_values::Dict{Num, Float64}
end
#=
Each observable should be given by a string ID
=#
struct PEtabObservable
obs::Num
end
#=
I think a DataFrame is the best option here (but we can keep this struct to allow anyone to simulate data).
See my comments on the fields. There are also observable parameters and PreEq ID:s (if we have a steady
state simulation), but I think we can add those later (should be straightforward).
=#
struct PEtabMeassurment
exp_id::String # Should be a text id
obs_id::String # Should be a text id
value::Float64
time_point::Float64
#=
The sigma-parameter (measurement noise) can either be a value or a parameter in the parameters-struct.
Has to be here, as different measurement-series (even of the same observable) can have different
measurement noise.
=#
noise_parameter::Union{Float64, Num}
end
#=
# For all of these, good defaults in PEtab.jl exists. Maybe should make each type Union{Nothing,...},
with Nothing indicating that PEtabl.jl selects its default. Nothing is a good thing here for defaults :),
note that prior should be allowed to be nothing (if no prior is given).
This looks good, but by PEtab standard replace isconstant with estimate
=#
struct PEtabParameter
parameter::Num
isconstant::Bool
value::Union{Nothing,Float64} # if isconstant==false, value=Nothing.
lb::Float64
ub::Float64
prior::Distribution{Univariate, Continuous}
scale::Symbol # :log10, :linear and :log supported.
end
Overall I think the following would work. If catalyst provides a CatalystPEtabModel then PEtab can use the information to basically create the PEtab-files (they do not have to explicitly created). Given the PEtab-files a PEtabODEProblem can easily be created. So I imagine a workflow like
petab_model = CatalystPEtabModel(...)
# I expand createPEtabODEProblem to handle CatalystPEtabModel in PEtab.jl
petab_problem = createPEtabODEProblem(petab_model::CatalystPEtabModel, ...)
So if you could provide a CatalystPEtabModel on the formt above for a simple model, say; k1 : X -> Y k2 : Y -> X Where we aim to estimate k1, k2, assuming normally distributed measurements of Y with sigma ~ N(0, 1) I can expand createPEtabODEProblem to handle CatalystPEtabModel.
Later in order to create a robust test-suite we can recreate the PEtab test-suite using reaction-system.
Overall, what do you think?
This looks good, here is the altered CatalystPEtabModel
format corresponding to your suggestions right now.
struct CatalystPEtabModel
system::ReactionSystem
experimental_conditions::Dict{String,PEtabExperimentalCondition}
observables::Dict{String,PEtabObservable}
meassurments::DataFrame
parameters::Vector{PEtabParameter}
end
struct PEtabExperimentalCondition
parameter_values::Dict{Num, Float64}
id::String
end
struct PEtabObservable
obs::Num
id::String
end
struct PEtabParameter
parameter::Num
estimate::Bool
value::Union{Nothing,Float64}
lb::Union{Nothing,Float64}
ub::Union{Nothing,Float64}
prior::Union{Nothing,Distribution{Univariate, Continuous}}
scale::Union{Nothing,Symbol} # :log10, :linear and :log supported.
end
I will create an example as well that we could use to try and fit data. I agree, let's do steady state after we have a working methodology without.
My only question is regarding having meassurments
as a DataFrame
. When we have it as an explicit struct, don't we guarantee that it has a certain number of fields and that these are filled, while a DataFrame
could contain additional fields (or miss some)? Is there a way to define a DataFrame
type where it has a certain number of fields? If not I guess we could add a quick check to the data frame ensuring it contain the right informtation.
Also, exactly how do you want the u0
defined? For a species X
, should I simply create a parameter X
which denotes its initial condition?
I don't think we can restrict a DataFrame, on the other side, writing a function to check the DataFrames is straightforward.
Good question, preferably as a state_map (which is already how we do it), i.e on the format:
state_map = [x => 207.6*k1, y => 0.0]
That is, for each model state either a numerical value, or an equation (depending on model parameters, which can be an initial value) are okay to supply.
Sounds good.
Should I make the initial conditions part of the PEtabExperimentalCondition
struct, e.g. expand it to:
struct PEtabExperimentalCondition
parameter_values::Dict{Num, Float64}
u0::Vector{Pair{Num.Union{Float64,Num}}}
end
? Here I put the first value to a Num
, but could be a Symbol
or something else as well, was a bit unsure.
I think actually u0 should be something entirely separate (which is then provided into CatalystPEtabModel). I am thinking about doing as is done in ModellingToolkit where when building on ODEProblem the user can if they want to provide a state-map (as here https://docs.sciml.ai/ModelingToolkit/stable/tutorials/ode_modeling/).
And for parameter values the first value being Num looks good.
I am not entirely sure what you mean, You mean something like this where u0
is a separate field?
struct CatalystPEtabModel
system::ReactionSystem
experimental_conditions::Vector{PEtabExperimentalCondition}
observables::Vector{PEtabObservable}
meassurments::Vector{PEtabMeassurment}
parameters::Vector{PEtabParameter}
u0::Vector{Pair{Num.Union{Float64,Num}}}
end
Then if one wants u0
to varry between various experimental conditions one set that specific u0
value to be a parameter, the value which depends on experimental_conditions["ExpX"].parameter_values
?
Also, a second question. If you have e.g. parameter p
depend on experimental_conditions["ExpX"].parameter_values[p]
, what should p
be in the parameters
field? I presume that it must have estimate=true
. However, can it still have a value
(which is used as the default when experimental_conditions["ExpX"].parameter_values
does not contain the parameter)?
Yes, this is how I see the struct. And yes, if we want some u0 to change between conditions it is better if we make that specific u0 value to change a parameter. I think this is a good solution as, at least in my experience, typically only a few u0 values are changed between experimental conditions, so instead of having to provide an u0 per experimental condition, it is probably better to specify the u0:s that change by a parameter value.
Now a complication here, which I think you touch on with the second question, is that if an initial value is a a parameter how do we specify that? If it is a parameter that is constant or we want to estimate is should be specified in PEtabParameter. However, if it is a control-parameter (dictating an experimental condition) it should not be specified in PEtabParameter. In the PEtab standard control parameters are not a part of PEtabParameter (as they are control parameters and not yet allowed to be estimated), rather the user must provide a value for the control parameter for each experimental condition. So I am thinking that if a user provide a new parameter as initial value control parameter, then we should enforce the user provides a value on that parameter for each experimental condition. What do you think?
(I work in a category of systems where the only difference between experimental conditions is that one changes all the non-trivial (non-0) u0 values, however, for the greater good I don't mind going with your solution, seems like it should work out)
So, in essence, we have 3 types of parameter values?
1) Parameters that we would like to estimate (that might have a prior distribution and bounds).
2) Parameters that have a known value.
3) Parameters that have known values, but with these varying between each experimental condition.
Here, the first two would be required to be specified in the PEtab parameters
field/file, while the third would be given in the experimental_conditions
field/file (and if it is not provided for each experimental condition, and error would be thrown). It seems like a weird dissonance that (2) is provided in parameters
but (3) is not (however, I can also see why, as they have to be provided elsewhere).
Still, I think we have a fully working format here, so maybe we go with this to make something that works, and then we can discuss whenever it makes sense to throw things around a bit later on?
I see what you mean with 2 and 3, and typically in the PEtab standard 2 does not have to specified explicitly (as often the constant values for these are already set in the SBML file), so if it could be possible to set constant parameter when specifying the reaction system then given that, it would not be needed to specify all constant parameters in Parameters?
And I agree, we can go with this, get something that works, and then make potential changes-
I see what you mean with 2 and 3, and typically in the PEtab standard 2 does not have to specified explicitly (as often the constant values for these are already set in the SBML file), so if it could be possible to set constant parameter when specifying the reaction system then given that, it would not be needed to specify all constant parameters in Parameters?
And I agree, we can go with this, get something that works, and then make potential changes.
Technically it would be possible to parse the provided ReactionSystem
, and substitute in parameter values before the system is sent over to PEtab. However, ideally, I would like to keep these separate, as this is how they are handled across the entire SciML system. I also think there's a decent number of cases where this is advantageous (e.g. in many cases system input is modelled as a parameter, and a callback is used to change this value to signal activation. This is no longer possible if a parameter is substituted in).
I will keep things as suggested now, and then we can add this to the list of things to consider once we have a methodology sorted out.
Does this work?
### Preparations ###
# Fetch required packages.
using Catalyst, OrdinaryDiffEq, Distributions, DataFrames, Plots
# Prepares the model simulation conditions.
rn = @reaction_network begin
(p,d), 0 <--> X
(1,k), X <--> Y
end
u0 = [1.0, 0.0]
tspan = (0.0, 10.0)
p = [1.0, 0.2, 0.5]
### Declares CatalystPEtabModel Structure ###
# The conditiosn for an experiment.
struct PEtabExperimentalCondition
parameter_values::Dict{Num, Float64}
id::String
end
# An observable value.
struct PEtabObservable
obs::Num
id::String
end
# A parameter.
struct PEtabParameter
parameter::Num
estimate::Bool
value::Union{Nothing,Float64}
lb::Union{Nothing,Float64}
ub::Union{Nothing,Float64}
prior::Union{Nothing,Distribution{Univariate, Continuous}}
scale::Union{Nothing,Symbol} # :log10, :linear and :log supported.
end
# A full PEtab model.
struct CatalystPEtabModel
system::ReactionSystem
experimental_conditions::Dict{String,PEtabExperimentalCondition}
observables::Dict{String,PEtabObservable}
meassurments::DataFrame
parameters::Vector{PEtabParameter}
u0::Vector{Pair{Num,Union{Float64,Num}}}
end
### Performs Experiment ###
# Meassurment settings.
meassurment_times = 2.0:2.0:10.0
meassurment_error_dist = Normal(0.0, 0.4)
# Make simulation.
oprob = ODEProblem(rn, u0, tspan, p)
sol = solve(oprob, Tsit5(); tstops=meassurment_times)
# Get meassurment values.
meassured_values_true = [sol[2, findfirst(sol.t .== mt)] for mt in meassurment_times]
meassured_values = meassured_values_true .+ rand(meassurment_error_dist, length(meassured_values_true))
# Plot the experiment an meassurments.
plot(sol; idxs=2)
plot!(meassurment_times, meassured_values; seriestype=:scatter)
### Prepares the PEtab Structure ###
@unpack X, Y, p, d, k = rn
# The system field.
system = rn
# The experimental conditions field.
exp1 = PEtabExperimentalCondition(Dict([X=>1.0, Y=>0.0, d=>0.2, k=>2.0]), "Obs1")
experimental_conditions = Dict(["Exp1" => exp1])
# The observables field.
obs1 = PEtabObservable(Y, "Obs1")
observables = Dict(["Obs1" => obs1])
# The meassurments field
nM = length(meassured_values)
meassurments = DataFrame(exp_id=fill("Exp1", nM), obs_id=fill("Obs1", nM), value=meassured_values, time_point=meassurment_times, noise_parameter=fill(0.4,nM))
# The parameters field.
par_p = PEtabParameter(p, false, nothing, 1e-2, 1e2, nothing, :log10)
par_d = PEtabParameter(d, true, 0.2, nothing, nothing, nothing, nothing)
par_p = PEtabParameter(k, true, 2.0, nothing, nothing, nothing, nothing)
parameters =[par_X, par_Y, par_p, par_d, par_p]
# The initial conditions field.
u0 = [X => 1.0, Y => 0.0]
# Creates the model
petab_model = CatalystPEtabModel(system, experimental_conditions, observables, meassurments, parameters, u0)
### Saves Model to File ###
using Serialization
serialize("petab_model.jls", petab_model)
Currently, there is only one parameter to estimate, however, if that works we should be able to make an additional parameter unknown, or something that is carried in experiments. I'm also sending the petab_model.jls file (although the CatalystPEtab
struct must still be declared to load it)
Thanks, this looks like it would work. Will try to take a look and come back this or early next week :)
This turned out to be easier than expected.
(Note I made some small alterations to the Structs)
include(joinpath(@__DIR__, "Catalyst_functions.jl"))
# Prepares the model simulation conditions.
rn = @reaction_network begin
(p,d), 0 <--> X
(1,k), X <--> Y
end
### Performs Experiment (simulate data) ###
u0 = [1.0, 0.0]
tspan = (0.0, 10.0)
p = [1.0, 0.2, 0.5]
meassurment_times = 2.0:2.0:10.0
meassurment_error_dist = Normal(0.0, 0.4)
oprob = ODEProblem(rn, u0, tspan, p)
sol = solve(oprob, Tsit5(); tstops=meassurment_times)
meassured_values_true = [sol[2, findfirst(sol.t .== mt)] for mt in meassurment_times]
meassured_values = meassured_values_true .+ rand(meassurment_error_dist, length(meassured_values_true))
### Prepares the PEtab Structure ###
@unpack X, Y, p, d, k = rn
# The system field.
system = rn
# The experimental conditions field.
exp1 = PEtabExperimentalCondition(Dict([d=>0.2, k=>2.0]))
experimental_conditions = Dict(["Exp1" => exp1])
# The observables field.
obs1 = PEtabObservable(Y, :lin, 0.4)
observables = Dict(["Obs1" => obs1])
# The meassurments field
nM = length(meassured_values)
meassurments = DataFrame(exp_id=fill("Exp1", nM), obs_id=fill("Obs1", nM), value=meassured_values, time_point=meassurment_times, noise_parameter=fill(0.4,nM))
# The parameters field (we are going to create a nice constructor here)
par_p = PEtabParameter(p, true, nothing, 1e-2, 1e2, nothing, :log10)
par_d = PEtabParameter(d, false, 0.2, nothing, nothing, nothing, nothing)
par_k = PEtabParameter(k, false, 2.0, nothing, nothing, nothing, nothing)
petab_parameters = [par_p, par_d, par_k]
# The initial conditions field.
state_map = [X => 1.0, Y => 0.0]
# Creates the model
petab_model = readPEtabModel(system, experimental_conditions, observables, meassurments,
petab_parameters, state_map, verbose=true)
# Can now easily be made into PEtabODEProblem
petabProblem = createPEtabODEProblem(petab_model)
p = [log10(20)]
f = petabProblem.computeCost(p)
∇f = petabProblem.computeGradient(p)
Δf = petabProblem.computeHessian(p)
I added functions to parse the PEtab-input structs into the PEtab-file format (as DataFrames). Given that, I could build a PEtabODEProblem from the reaction system. Overall, with this approach it should also be easy to add additional features such as stady-state conditions, noise parameters etc. (basically recreate the PEtab test-suite).
Going forward, in order to create a smooth user experience I think we need to figure out how to specify initial values, and how to set default parameter values (parameters which are constant and are not control-parameters). An example, currently if I want to estimate initial-value I have to do the following when defining the reaction system:
rn = @reaction_network begin
@parameters a0 b0 # Initial values to estimate must be parameters
(k1), A --> B
(k2), B --> A
end
@unpack A, B, k1, k2, a0, b0 = rn
state_map = [A => a0, B => b0]
While I imagine something like this would be smoother
rn = @reaction_network begin
@parameters a0 b0 # Initial values to estimate must be parameters
@species A(t)=a0 B(t)=b0 # Only specify for species having non-zero initial value
(k1), A --> B
(k2), B --> A
end
Likewise, instead of setting constant non-control parameter values via PEtabParameter I think it would be nice to set these directly when defining the reaction system. This is how it is done typically for PEtab-model (default parameter values or the initial-value equation is set in the SBML-file). For this to work, however, I need to be able to extract the default values from the reaction system (to properly setup the functions needed by PEtab.jl), is it something that can be done? Further, any more input on this?
Looks good. Is there somewhere where I can check the updated struct
s?
It is possible to set default values of parameters and initial conditions already: https://docs.sciml.ai/Catalyst/stable/catalyst_functionality/dsl_description/#dsl_description_defaults Not sure if it is possible to make initial conditions parameters though. Whether or not that is easy to do depends on ModelingToolkit, I could check this.
I'm mixed with using default parameter values within the ReactionSystem
as the way to define values that are neither estimated nor varies between experimental settings. From the workflow where you create a ReactionSystem
, create a PEtabModel
, and then calibrate the parameters it makes sense. But if you want to e.g.
It should also be noted that some parameter properties can be defined for parameters via metadata: https://docs.sciml.ai/ModelingToolkit/stable/basics/Variable_metadata/#Bounds https://docs.sciml.ai/ModelingToolkit/stable/basics/Variable_metadata/#Mark-parameter-as-tunable Metadata can be added to parameters in Catalyst already (and we can add more fields if we want to) e.g.
@reaction_network begin
@parameters p1 [mymetadata=true] p1 [someothermetadata=2.0]
end
For now we might want to ignore these fields and make it work without, and then we can add functionality that infers these properties if they exist.
Yes, the updated structs can be found here: https://github.com/sebapersson/PEtab.jl/blob/Catalyst_integration/test/Catalyst_functions.jl. (note I removed PEtabCatalystModel as I figured out how to process these structs directly into a PEtabModel)
Regarding your points on parameter values I see what you mean. So we have that setting default values in non-ideal for workflows, while setting constant parameters via PEtabParameter is to clunky. Would it then be possible to set values for constant parameters via a parameter-map - like lets say we would want to set a constant value for k1 (and set k2 later via PEtabParameter)
rn = @reaction_network begin
@parameters a0 b0 # Initial values to estimate must be parameters
(k1), A --> B
(k2), B --> A
end
@unpack A, B, k1, k2, a0, b0 = rn
state_map = [A => a0, B => b0]
parameter_map [k1 => 3.0]
Then I could accept parameter_map when building the PEtab model and take this constant into consideration when building the ODEProblem.
Regarding parameter dependent initial values, do you think it is better to use a state-map, or specify the relationship in the reaction system?
Setting it via a parameter map like that seems good. It is also similar to how it is done with e.g. ODEProblem
s, so should be quite intuitive for most users.
For the states, I am currently leaning towards defining this in the reaction system. Having initial conditions being defined by parameters is something I can see being useful under other circumstances as well, so seems good to make it a general thing. I'm checking how this should work within the ModelingToolkit framework right now though. If it turns out to be a mess we might have to go with defining it in the state map and it being a feature within PETab only.
Sounds good, I will then add a parameter-map.
I will also code up the PEtab test-cases using Catalyst (too see if there is any useful feature we do not currently cover).
Setting intial conditions as parameters should work well:
using Catalyst
rn = @reaction_network begin
@parameters x0
@species X(t)=x0
(p,d), 0 <--> X
end
oprob = ODEProblem(rn, [], (0.0,10.), [:p=>1.0, :d => 3.0, :x0=>100.0])
sol = solve(oprob)
The value can be retrieved via e.g.
@unpack X = rn
Catalyst.get_defaults(rn)[X]
or found in
X.val.metadata[Symbolics.VariableDefaultValue]
Then I say we go with setting initial-values directly in the reaction system, I also managed to this approach working with PEtab.jl and I think it looks nice (below is the first PEtab test-case, note I added constructor for PEtabParameter):
#=
Recrating PEtab test-suite for Catalyst integration to robustly test that
we support a wide arrange of features for PEtab integration
=#
using Test
include(joinpath(@__DIR__, "..", "Catalyst_functions.jl"))
# Define reaction network model
rn = @reaction_network begin
@parameters a0 b0
@species A(t)=a0 B(t)=b0
(k1, k2), A <--> B
end
# Measurement data
measurements = DataFrame(exp_id=["c0", "c0"],
obs_id=["obs_a", "obs_a"],
time_point=[0, 10.0],
value=[0.7, 0.1],
noise_parameter=0.5)
# Single experimental condition
experimental_conditions = Dict(["c0" => PEtabExperimentalCondition(Dict())])
# PEtab-parameter to "estimate"
petab_parameters = [PEtabParameter(:a0, value=1.0, scale=:lin),
PEtabParameter(:b0, value=0.0, scale=:lin),
PEtabParameter(:k1, value=0.8, scale=:lin),
PEtabParameter(:k2, value=0.6, scale=:lin)]
# Observable equation
@unpack A = rn
observables = Dict(["obs_a" => PEtabObservable(A, :lin, 0.5)])
# Create a PEtabODEProblem
petab_model = readPEtabModel(rn, experimental_conditions, observables, measurements,
petab_parameters, verbose=true)
petab_problem = createPEtabODEProblem(petab_model)
# Compute negative log-likelihood
nll = petab_problem.computeCost(petab_problem.θ_nominalT)
@test nll ≈ 0.84750169713188 atol=1e-3
Overall, integrating PEtab with Catalyst seem to work nicely. Next I could expand the importer and structs to cover additional features (e.g. steady-state simulations) by recreating the PEtab test-suite, and after that I propose we see if we should change some of the structs? (I am currently not entirely happy with how we specify measurement data, feels a bit clunky).
Looks good!
Quick questions:
a0, b0, k1, k2
parameters that we wish to estimate? If so, why are they given values (1.0, 0.0, 0.8, 0.6
)?noise_parameter=0.5
. You also have a single observable with (A, :lin, 0.5)
. Here, what does the two 0.5
values suggest? Does the lin
here suggest that one should use linear scale for the difference between measured and "assumed true` value. Good questions.
(A, :lin, 0.5)
0.5 is the noise-formula, which as we here have to be 0.5 (as we know the measurement-noise in this case). So in this case I do not need to specify noise (per PEtab standard) in measurements
. In case I would have a noise-formula like (A, :lin, A * noiseParameter1)
then in measurements I would have to provide noise_parameter=[a, b]
where a, b can be constant values that replace noiseParameter1, or even parameters specified by PEtabParameter that replace noiseParameter1. All-to-all, I think a good solution is to have the user provide the noise-formula in the observable, and in case it is a non-constant we can force the user to also provide a noise-parameter? And exactly, lin
suggests a linear scale for the difference (allowed are lin, log, log10), if we instead would have used log we would have assumed log-normal measurement error. Sounds good. I am still a bit uncertain about (1) through. If a0, b0, k1, k2
are estimated in this case (they are, right?), does setting this values have any direct impact on what is going on?
With regard to (2). Yes, it seems that when we are done we might want to have a series of checks to ensure nothing is missing. Your suggested approach here seems good (provide the noise formula in the observables, and any quantity here that carries between measurements is given as noise parameters there.
Right now, the number (noiseFormula
) in observable is a formula for the std in the normal distribution, right?
(3) To be honest, linear and log-normally distributed noise will probably cover almost all cases, and I cannot directly say there is another type of noise I'd want right now. I think it is more a question if we want to future-proof us so that the approach can work with other noise distributions when/if they get added. Would it make sense to add a field to PEtabObservable
for this, and for now we just assume that is always going to be a normal distribution?
petab_problem.θ_nominalT
in θ_nominalT
the set values can be found. So it is entirely for the convenience of testing the code against cases where we know the likelihood value.Excellent, yes, putting such a note about other distributions sounds like the way to go. Didn't even realise that there was customized code for normal distributions.
If we do not want to let initial conditions be parameters that are estimated, we can provide their values just using a map when we run readPEtabModel
? And for the parameter values that remain fixed throughout the simulation.
Yes, providing the the constant initial values like a map sounds like a good plan. And in the case the user does not provide anything for initial value I say we assume it is zero.
Assuming 0 if missing sounds good. I think the pieces are falling into place? I will start drafting documentation for Catalyst. Once there is a version of the interface working in PEtab.jl I might be able to help with the coding there to cover corner cases and similar stuff.
I agree the pieces are falling into place. I will work through the PEtab test-cases (to make sure there is nothing we miss in term of support), and add things like steady-state simulations. Have some other things this week, but should be done with this latest by the end of the week.
The Catalyst importer now supports all PEtab features (some features required a bit of work :) ), see the example below (where I tried to include everything):
#=
Showing all different things we can handle with PEtab.jl
=#
using Distributions
include(joinpath(@__DIR__, "..", "Catalyst_functions.jl"))
# Define reaction network model
rn = @reaction_network begin
@parameters se0
@species SE(t)=se0 # s0 = initial value for S
c1, S + E --> SE
c2, SE --> S + E
c3, SE --> P + E
end
#=
Set a constant state (in this case E and P) and parameter (in this case c1)
values via a state- and parameter-map respectively with a vector on the form
Symbol(variable) => value
=#
state_map = [:E => 1.0, :P => 0.0]
parameter_map = [:c1 => 1.0]
#=
For each simulation condtion we can specify values for states (in this case S)
and/or model parameters (in this case :c2) via a Dict. For these controll-
parameters values must be set for each simulation condition (I must add a check
for this). Note - c0_pre corresponds to a pre-equilibration - which is a steady-
state simulation carried out prior to matching the model against data (to mimic
that cells are at steady-state at time zero).
=#
experimental_conditions = Dict("c0_pre" => PEtabExperimentalCondition(Dict(:S => 2.0, :c2 => 2.0)),
"c0" => PEtabExperimentalCondition(Dict(:S => 10.0, :c2 => 2.0)))
#=
A PEtabObservable has fields;
observable formula : (formula for what is observed, e.g S + S)
measurment noise distribution . (either :lin (normal) or log-normal (:log))
noise-formula : formula for measurement noise σ
Parameters on the form noiseParameter... maps to the parameter or values specifed
under noise_parameter in the measurement data, and parameters on the form
observableParameter... maps to the parameters or values specifed in the
observable_parameters column in the measurement data. This allows the user to
specify different, for example, noise parameters per measurement to mimic a setting
where the same observable (e.g S) might been observed in different experiments, or
with different assays.
=#
@unpack S, P = rn
@parameters noiseParameter1_obs_S observableParameter1_obs_P observableParameter2_obs_P
observables = Dict("obs_S" => PEtabObservable(S, :log, noiseParameter1_obs_S * S),
"obs_P" => PEtabObservable(observableParameter1_obs_P*P + observableParameter2_obs_P, :lin, 1.0))
#=
The column pre_eq_id corresponds to the pre-equilibration criteria, that is under which
simulation-condition setting to carry out a steady-state simulation prior to matching the
model against data (e.g to mimic that cells are at steady-state at time zero). The column
observable_parameters holds the values for the observable-parameters specified in the
observable formula, and these can be parameters (specified in PEtabParameters) or constant
values (e.g 1.0). In case of several values these should be separated by ;. Similiar holds
for noise_parameter - these map to the noiseParameter... values in the noise-formula, and
these can be parameters or values. In case there are now noise-parameters, or observable-
parameters these columns can be left out (they are optional).
=#
measurements = DataFrame(exp_id=["c0", "c0", "c0", "c0"],
pre_eq_id=["c0_pre", "c0_pre", "c0_pre", "c0_pre"], # Steady-state pre-eq simulations
obs_id=["obs_S", "obs_S", "obs_P", "obs_P"],
time_point=[0.0, 10.0, 1.0, 20.0],
value=[0.7, 0.1, 1.0, 1.5],
observable_parameters=["", "", "scale_P;offset_P", "scale_P;offset_P"],
noise_parameter=["noise", "noise", "", ""])
#=
For a PEtab parameter to estimate the user must specify id. Then the user can choose scale
to estimate the parameter on (:log10 (default), :lin and :log). For each parameter any
univariate cont. prior can be set to turn the parameter estimation problem into a
maximum a posteriori problem. If prior_on_linear_scale=true (default) the prior applies
on the linear-scale, while if false it applies on the parameter scale (e.g log10 in case
the user choose scale=:log10). In case the parameter is not be estimated the user can
set estimate=false (default true). lb and ub (lower and upper bounds) can also be
set for each parameter.
=#
petab_parameters = [PEtabParameter(:c3, scale=:log10, prior=Normal(0.0, 2.0), prior_on_linear_scale=false),
PEtabParameter(:se0, scale=:lin, prior=LogNormal(1.0, 0.5)),
PEtabParameter(:scale_P, scale=:log10),
PEtabParameter(:offset_P, scale=:log10),
PEtabParameter(:noise, scale=:log10)]
# Given all this it is easy to set a PEtabODEProblem and then compute the cost, gradient
# and Hessian.
petab_model = readPEtabModel(rn, experimental_conditions, observables, measurements,
petab_parameters, stateMap=state_map, parameterMap=parameter_map,
verbose=true)
petab_problem = createPEtabODEProblem(petab_model)
# Compute negative log-likelihood
θ = [1.0, 1.0, 1.0, 1.0, 1.0]
f = petab_problem.computeCost(θ)
∇f = petab_problem.computeGradient(θ)
Δf = petab_problem.computeHessian(θ)
Overall, we can have priors, pre-equillbration steady-state simulations, different measurement noise distributions, and estimate observable and noise parameters.
What is left is to polish it a bit (name things better) and add checks that the user input is correct. For naming and input format I have the following suggestions:
Any other input on how to specify the PEtab-structs and input data? I also plan to add Catalyst as a package-extension for PEtab.jl.
This looks good. I will be travelling this weekend, but starting next week and when the Catalyst extension is ready I will start to create tutorials in Catalyst.jl docs (we can put them here as well if you want).
Looking forward, and especially with regard to the pre-equilibrium functionality. Is there a way to provide a callback to the simulation for the parameter estimation? I think some experimental layouts are already covered without callbacks, but if you have an experiment where the system remains for some number of time units in the steady state, and then the system is activated, this might require callbacks.
Sounds good with tutorials!
PEtab.jl has functionality for handling callbacks but callbacks are not yet support for Catalyst (only for SBML models). I agree it is a useful feature, how would you like callbacks for a Catalyst model to be specified? Is there some symbolic way?
In DiffEq, callbacks are typically provided with the solve(problem,...;callback=)
command. In PEtab.jl the solve
options seems to, generally, be provided in the odeSolverOptions
structure, so that would be one alternative. However, Catalyst ReactionSystem
s should support events being provided in the system structure, just like ModelinGToolkit ODESystems: https://docs.sciml.ai/ModelingToolkit/stable/basics/Events/
I haven't actually played around with this functionality. However, ReactionSystem
s have the continuous_events
and discrete_events
fields, so if you think that would be better, this should work.
For continuous_events
, i.e. scenarios where the trigger depend on the model states like if X < 1 --> Y = 1
, I think the most convenient solution would be for the user to try to encode these in the system structure.
For discrete events where we know the event time (or the event time is a model parameter), it can become very involved for the user to code the event via ModellingToolkit syntax (e.g. code something like at time t1 add substrate T and at time t2 remove substrate T). Moreover, if the event trigger happens to depend on a model parameter to estimate, e.g. t == c1
, then to properly compute gradients I need to convert the time-variable to dual number, and have a function that computes tstops based on the model parameters (we currently do this for the SBML importer); all-to-all handling discrete events like dosages is messy. Adding PEtab support for dosage schemes is currently discussed https://github.com/PEtab-dev/PEtab/issues/564, and to ensure correctness I think the best solution for discrete events would be to add some additional PEtab-struct like PEtabDosage, then under the hood given a PEtabDosage write appropriate discrete callbacks. What do you think?
On another note, I have now added checks for the PEtab input format, and on the Catalyst_integration branch added Catalyst as a package extension. I have also written tests to cover all PEtab features. I only have to add documentation (hopefully done this week) before I merge with main. So all-to-all, so I think you can now start to create tutorials.
Something that can help you is hopefully this example that should cover most features:
#=
Showing all different things we can handle with PEtab.jl
=#
using Catalyst
using CSV
using DataFrames
using Distributions
using PEtab
# Define reaction network model
rn = @reaction_network begin
@parameters se0
@species SE(t)=se0 # se0 = initial value for S
c1, S + E --> SE
c2, SE --> S + E
c3, SE --> P + E
end
#=
Set a constant state (in this case E and P) and parameter (in this case c1)
values via a state- and parameter-map respectively with a vector on the form
Symbol(variable) => value
=#
state_map = [:E => 1.0, :P => 0.0]
parameter_map = [:c1 => 1.0]
#=
For each simulation condtion we can specify values for states (in this case S)
and/or model parameters (in this case :c2) via a Dict. For these controll-
parameters values must be set for each simulation condition. Note - c0_pre corresponds
to a pre-equilibration - which is a steady-state simulation carried out prior to
matching the model against data (to mimic that cells are at steady-state at time zero).
=#
simulation_conditions = Dict("c0_pre" => Dict(:S => 2.0, :c2 => 2.0),
"c0" => Dict(:S => 10.0, :c2 => 2.0))
#=
A PEtabObservable has fields;
observable formula : (formula for what is observed, e.g S + S)
measurment noise distribution . (either :lin (normal) or log-normal (:log))
noise-formula : formula for measurement noise σ
Parameters on the form noiseParameter... maps to the parameter or values specifed
under noise_parameter in the measurement data, and parameters on the form
observableParameter... maps to the parameters or values specifed in the
observable_parameters column in the measurement data. This allows the user to
specify different, for example, noise parameters per measurement to mimic a setting
where the same observable (e.g S) might been observed in different experiments, or
with different assays.
=#
@unpack S, P = rn
@parameters noiseParameter1_obs_S observableParameter1_obs_P observableParameter2_obs_P
observables = Dict("obs_S" => PEtabObservable(S, noiseParameter1_obs_S * S, transformation=:log),
"obs_P" => PEtabObservable(observableParameter1_obs_P*P + observableParameter2_obs_P, 1.0))
#=
The column pre_eq_id corresponds to the pre-equilibration criteria, that is under which
simulation-condition setting to carry out a steady-state simulation prior to matching the
model against data (e.g to mimic that cells are at steady-state at time zero). The column
observable_parameters holds the values for the observable-parameters specified in the
observable formula, and these can be parameters (specified in PEtabParameters) or constant
values (e.g 1.0). In case of several values these should be separated by ;. Similiar holds
for noise_parameter - these map to the noiseParameter... values in the noise-formula, and
these can be parameters or values. In case there are now noise-parameters, or observable-
parameters these columns can be left out (they are optional).
=#
measurements = DataFrame(simulation_id=["c0", "c0", "c0", "c0"],
pre_eq_id=["c0_pre", "c0_pre", "c0_pre", "c0_pre"], # Steady-state pre-eq simulations
obs_id=["obs_S", "obs_S", "obs_P", "obs_P"],
time=[0.0, 10.0, 1.0, 20.0],
measurement=[0.7, 0.1, 1.0, 1.5],
observable_parameters=["", "", "scale_P;offset_P", "scale_P;offset_P"],
noise_parameters=["noise", "noise", "", ""])
#=
For a PEtab parameter to estimate the user must specify id. Then the user can choose scale
to estimate the parameter on (:log10 (default), :lin and :log). For each parameter any
univariate cont. prior can be set to turn the parameter estimation problem into a
maximum a posteriori problem. If prior_on_linear_scale=true (default) the prior applies
on the linear-scale, while if false it applies on the parameter scale (e.g log10 in case
the user choose scale=:log10). In case the parameter is not be estimated the user can
set estimate=false (default true). lb and ub (lower and upper bounds) can also be
set for each parameter.
=#
petab_parameters = [PEtabParameter(:c3, scale=:log10, prior=Normal(0.0, 2.0), prior_on_linear_scale=false),
PEtabParameter(:se0, scale=:lin, prior=LogNormal(1.0, 0.5)),
PEtabParameter(:scale_P, scale=:log10),
PEtabParameter(:offset_P, scale=:log10),
PEtabParameter(:noise, scale=:log10)]
# Given all this it is easy to set a PEtabODEProblem and then compute the cost, gradient
# and Hessian.
petab_model = readPEtabModel(rn, simulation_conditions, observables, measurements,
petab_parameters, stateMap=state_map, parameterMap=parameter_map,
verbose=true)
petab_problem = createPEtabODEProblem(petab_model)
# Compute negative log-likelihood, gradient and Hessian
θ = [1.0, 1.0, 1.0, 1.0, 1.0]
f = petab_problem.computeCost(θ)
∇f = petab_problem.computeGradient(θ)
Δf = petab_problem.computeHessian(θ)
Looks great! I will start re-ordering Catalyst Docs to make place for an expanded section on the inverse problem and using PEtab, and start writing the tutorials. Once PEtab.jl is updated I will start a PR and tag you so you can have a look.
For continuous_events, i.e. scenarios where the trigger depend on the model states like if X < 1 --> Y = 1, I think the most convenient solution would be for the user to try to encode these in the system structure.
For discrete events where we know the event time (or the event time is a model parameter), it can become very involved for the user to code the event via ModellingToolkit syntax (e.g. code something like at time t1 add substrate T and at time t2 remove substrate T). Moreover, if the event trigger happens to depend on a model parameter to estimate, e.g. t == c1, then to properly compute gradients I need to convert the time-variable to dual number, and have a function that computes tstops based on the model parameters (we currently do this for the SBML importer); all-to-all handling discrete events like dosages is messy. Adding PEtab support for dosage schemes is currently discussed https://github.com/PEtab-dev/PEtab/issues/564, and to ensure correctness I think the best solution for discrete events would be to add some additional PEtab-struct like PEtabDosage, then under the hood given a PEtabDosage write appropriate discrete callbacks. What do you think?
Interesting, I had thought that continuous callbacks would be the difficult part, and the discrete ones the easy case. When it comes to, for the user, defining discrete callbacks via ModelingToolkit, that should not be a problem. Especially for the limited number that should cover the base cases, I am more than happy to implement stuff to make this easy for the user. However, creating a PEtabDosage
seems like it might be the better way to go. It will be a little weird that continuous and discrete callbacks are given differently. However, in practice, I think it will be very rare that these will actually be used. My guess is that 95%+ of cases will be:
This could easily be defined in such a PEtabDosage
struct. Then, if there is a demand for more complicated discrete callbacks (or continuous callbacks) we can deal with that later.
If https://github.com/PEtab-dev/PEtab/issues/564 seems to progress slowly, we might want to go ahead and define this structure and add support for it, before main PEtab does. Then, when PEtab eventually supports this, we can align our input to match. Alternatively, us going ahead with it might push PEtab to incorporate the functionality (and use something similar to us).
The timeplan for this would depend on how messy it is for you to implement. If an implementation where dosage times do not depend on parameters is easy (and does not require lots of revamping once we want to support parameter-dependant times), then maybe we should do this first (again waiting with parameter-dependant dosage times until we see that there is a demand for them).
For the structure, maybe something like this?
# Main struct, contains one value for each file in the PEtab format.
struct PEtabModel
...
doses::Vector{PEtabDose}
end
struct PEtabDose
time::Num # Either a formula that depends on parameters only, or a Float64
target::Num # The species (or parameter) that should be updated.
value::Num # The updated value.
end
Here, if value
could either just be 1.0
(if it is a fixed value), or maybe 2p
(if e.g. our dose is to double the value of the parameter p
).
On a side note, for cases where we only represent a single species/parameter, I think that we maybe should use BasicSymbolic{Real}
instead of Num
. Here, Num
would be reserved for equation expressions (like having an observable being X_active+X_inactive
). Sorry, this was me being a bit lazy when I wrote things up.
I agree with using BasicSymbolic{Real}
should be used for single parameters/states.
And I agree, we can implement a PEtabDose
and align it later when the PEtab standard is updated. Maybe to avoid confusion by having the user write some events in the ReactionSystems, and others as dosage, we should maybe instead use a struct like this to capture both Cont. and Discrete. events?
struct PEtabDose
trigger::Num
target::Num # The species (or parameter) that should be updated.
value::Num # The updated value.
duration::Num
end
Where if trigger
is a single value or parameter we assume it refers to a trigger time, but we also allow trigger to be something like S > 3
and then under the hood handles it like a continues event (I have code for this). Value can then either be a concrete value or model parameter. I think it would also be nice to allow duration
- that is for time-triggered events allow the user to set a duration before a value or parameter changes back to the original value. All this (except duration) should be straightforward to implement, while maintaining correctness of model gradients.
That PETabDose
approach seems good.
Great, I will try to implement it then next week :) (first have to catch up a bit on the documentation)
I have had some problems actually fitting a model using the new interface. I have tried various options, but all fail:
using PyCall
using Optim
callibrateModel(petab_problem, Fides(verbose=false))
callibrateModel(petab_problem, parameter_map, Fides(verbose=false))
callibrateModel(petab_problem, state_map, Fides(verbose=false))
callibrateModel(petab_problem, state_map, parameter_map, Fides(verbose=false))
callibrateModel(petab_problem, IPNewton())
callibrateModel(petab_problem, parameter_map, IPNewton())
callibrateModel(petab_problem, state_map, IPNewton())
callibrateModel(petab_problem, state_map, parameter_map, IPNewton())
What is the correct syntax to use here, @sebapersson?
Sorry this part has been a bit of work in progress so the syntax can be a bit unclear. The correct syntax is:
using Optim
#=
p0 = Vector{Float64} which holds the start-guess for the parameter estimation. The parameter ordering can be obtained from petab_problem.θ_estNames
=#
res = calibrateModel(petabProblem, p0, Optim.IPNewton())
Currently p0
is a Vector{Float64}
, but it might be worth to also allow it to be something like a parameter-map (and then under the hood the ordering is fixed)?
You can find more examples here for how to run multi-start parameter estimation. The documentation for this will soon also be up (I will soon merge the Catalyst_integration branch with main)
Generally, for Catalyst (and other ModeingToolkit-based tools) the recommended workflow is to use parameter maps, either
rs = @reaction_network begin
(p,d), 0 <--> X
end
parameter_map [:p => 3.0, :d => 1.0]
or
rs = @reaction_network begin
(p,d), 0 <--> X
end
@unpacl p, d = rs
parameter_map [p => 3.0, d => 1.0]
If useful, I can help with implementation.
I see. Most optimization packages take a Vector though, e.g. Optmisation.jl as here https://docs.sciml.ai/Optimization/stable/examples/rosenbrock/ which I think is the most scalable solution if there are many (>30) parameters to estimate. But for smaller problems I do like the syntax with the parameter map better. So I think the best solution is to allow both - that if a vector is provided then in calibrateModel function the parameter map is parsed into a Vector{Float64}.
Would be grateful if you could help with the implementation
I agree. if you make the version that acts on Vector{Float64}
I will make a PR with overloads in case maps are given instead.
I have now merged the Catalyst integration with main, and following #69 I have also changed the naming to better follow the Julia code-standard, as can be seen in the updated docs https://sebapersson.github.io/PEtab.jl/dev/Define_in_julia/.
Next release will be breaking (as naming changes), so before that if you (@TorkelE ) have any feedback on naming etc please tell.
The only thing left now is PEtabDose
which I hopefully will have time to add next week.
Still traveling, but will be back tomorrow (Sunday) at which point I will make some additional tutorials, should hopefully be done quite quickly.
I'm going through things, and only really have two thoughts:
There are a decent number of cases when there is only really is a single experiment (e.g. modelling epidemics, where each epidemic is distinct and only happens once). Would it make sense to create a dispatch for PEtabModel
when no (or empty) simulation_conditions
is provided? Under the hood, PEtabModel
could then create an empty condition dictionary, give it a default tag ("__condition_0"
) and attach it to all measurements (and do a check so that there are no parameters unaccounted for).
I'd say in the vast majority of cases, the observable is a species of the system. In these cases, it would be pleasant to be able to designate this without having to explicitly @unpack
that species. E.g. for
rn = @reaction_network begin
(p, d), 0 <--> X
end
where X
is observed, something like
obs_X = PEtabObservable(:X, 0.1)
would be useful. When PEtabModel
is run, one would infer that :X
is in fact the variable X(t)
. Even for systems with more complicated observables, e.g.
rn = @reaction_network begin
(k1,k2), X1 <--> X2
end
where we now would run
@unpack X1, X2 = rn
obs_Xtot = PEtabObservable(X1+X2, 0.1)
my goal is that observables should instead be incorporated into the reaction system:
rn = @reaction_network begin
@observed Xtot = X1+X2 # Not actually implemented yet, the .observed field exist though, but cannot be filled this way).
(k1,k2), X1 <--> X2
end
and one could run
obs_Xtot = PEtabObservable(:Xtot, 0.1)
However, I the current approach is more general. For a starter, it does give the flexibility of being able to define stuff like Xtot=X1+X2
within PEtabObservable
, but also (as in your example) to have more explicit formula for measurement errors and noise.
I'm not sure what is best. I can imagine that it is a bit messy to support both cases. If there is an elegant solution I think it would be worth it, however, depends on what you think is best.
Me and @sebapersson talked today about linking PEtab.jl and Catalyst.jl (https://github.com/SciML/Catalyst.jl) together.
The idea is that someone who has defined a Catalyst.jl reaction system, should be able to define the information required by PEtab.jl to create optimization problems for fitting cost functions. Basically, I would want to create an intermediary
struct
,PEtabModel
(made-up name), so that one can runThe hope is that someone who is using Catalyst.jl should be able to circumvent using files to utilise PEtab entirely. I'm really excited about this. Hopefully, we will be able to create really smooth workflows for parameter optimization of CRNs using Julia.
After our discussion, we decided that I should create a proposed structure that holds the information typically stored in the PEtab.jl files. We can then discuss how this one can be improved to better accommodate what PEtab.jl needs. Ultimately, I think having some form of
struct PEtabModel
will be useful. the equation is whenever we should:readPEtabModel
create this structure from the file.Currently, I am thinking of creating something like this:
I think that it should be easy to, within Catalyst, create something like this and send to PEtab. What would be the ideal way for PEtab to receive it in though? If you have a format that you think you could work in, and ensure that that works as input to
createPEtabODEProblem
, then I can make a link to that, and we should be able to put an example together.