Different time vectors for different exp. conditions

paulflang commented 4 years ago

The time in the Plos Comp Bio measurement data differs slightly between conditions. It would be nice if DisFit would support that. More generally speaking, it may sometimes occur that each experiment and each observable has it's own time vector. In an attempt to support that sort of flexibility in time specification, I wrote the following code: 20200822_JuMP indexing issue.txt

Running the code results in the following error in line 153:

ERROR: MethodError: no method matching size(::JuMP.Containers.SparseAxisArray{Float64,2,Tuple{Int64,Int64}})
Closest candidates are:
  size(::AbstractArray{T,N}, ::Any) where {T, N} at abstractarray.jl:38
  size(::BitArray{1}) at bitarray.jl:70
  size(::BitArray{1}, ::Any) at bitarray.jl:74

To break things down a bit, the root case of the problem is that JuMP uses JuMP.Containers.SparseAxisArray instead of Array once the second index depends of the first like so:

# Define observables
println("Defining observables...")
@variable(m, 0.2 <= obs_a[j in 1:1, k in 1:length(t_exp[j])] <= 0.2, start=1.)
@NLconstraint(m, [j in 1:1, k in 1:length(t_exp)], obs_a[j, k] == A[j, t_sim_to_exp[k]])
@variable(m, 0.8 <= obs_b[j in 1:1, k in 1:length(t_exp)] <= 0.8, start=1.)
@NLconstraint(m, [j in 1:1, k in 1:length(t_exp)], obs_b[j, k] == B[j, t_sim_to_exp[k]])

julia> typeof(obs_a)
JuMP.Containers.SparseAxisArray{VariableRef,2,Tuple{Int64,Int64}}

julia> typeof(obs_b)
Array{VariableRef,2}

Anyway, in the end I would like to have a Dict of Dicts, where I can put in the simulation value of each observable in each condition like so: results_observable['condition_x']['observable_y'] = [1,2,3,4,5]

@sshin23 : Do you have a suggestion of how this can be achieved? Also, if you look at the code above, do you think this sort of flexible indexing will slow down JuMP substantially? Do you have an idea how to make the Julia code more elegant?

Thanks a lot!

sshin23 commented 4 years ago

Hi Paul,

Replacing Array(JuMP.value.(o)) to Dict(key=>val for (key,val) in o.data) should fix the issue; it will create a nested dictionary where the inner one has type Dict{Tuple{Int,Int},Float64}.

Once a JuMP model is created and passed to Ipopt, Ipopt.jl transcribes the data into a format that can be efficiently handled by Ipopt. So SparseAxisArray format won't significantly slow down the optimization process. The current code looks fine to me.

paulflang commented 4 years ago

Thanks. Seems to work. Will try to implement it and push a new version when it is done.

paulflang commented 4 years ago

I just looked into that a bit deeper, but this is not exactly doing what I want. I need observable_values to be a nested dict of the same structure as m_exp (except that it contains simulated rather than measured observable values). I am copying in an updated code (now for petab_test_suite case 0002, to better showcase the problem) below:

using CSV
using DataFrames
using Ipopt
using JuMP

n_steps = 101 # Setting number of ODE discretisation steps

# Data
println("Reading measurement data...")
data_path = "/media/sf_DPhil_Project/Project07_Parameter Fitting/df_software/petab_test_suite/cases/0002/_measurements.tsv"
df = CSV.read(data_path; use_mmap=false)
insert!(df, 1, (1:length(df[:,1])), :id)

df_by_o = groupby(df, :observableId)
df_by_c = groupby(df, :simulationConditionId)

# Setting variables
t_sim = range(0, stop=maximum(df[!, :time]), length=n_steps)
t_exp = Dict()
m_exp = Dict()
t_sim_to_exp = Dict()
deduplicated_time_idx = Dict()

cond2idx = Dict()
for i_cond in 1:length(keys(df_by_c))
    cond_name = values(keys(df_by_c)[i_cond])[1]
    cond2idx[cond_name] = i_cond
end

for i_obs in 1:length(keys(df_by_o))
    obs_name = values(keys(df_by_o)[i_obs])[1]
    t_exp[obs_name] = Dict()
    m_exp[obs_name] = Dict()
    t_sim_to_exp[obs_name] = Dict()
    deduplicated_time_idx[obs_name] = Dict()
    df_by_o_c = groupby(DataFrame(df_by_o[i_obs]), :simulationConditionId)
    for i_cond in 1:length(keys(df_by_o_c))
        cond_name = values(keys(df_by_o_c)[i_cond])[1]
        cond_idx = cond2idx[cond_name]
        t_exp[obs_name][cond_idx] = df_by_o_c[i_cond][!, :time]
        m_exp[obs_name][cond_idx] = df_by_o_c[i_cond][!, :measurement]

        t = df_by_o_c[i_cond][!, :time]
        tmp = []
        for i in 1:length(t)
            idx = argmin(abs.(t[i] .- t_sim))
            append!(tmp, idx)
        end
        t_sim_to_exp[obs_name][cond_idx] = tmp

        deduplicated_time_idx[obs_name][cond_idx] = []
        idx = 0
        prev = "a"
        for t in df_by_o_c[i_cond][!, :time]
            if t != prev
                idx = idx + 1
            end
            push!(deduplicated_time_idx[obs_name][cond_idx], idx)
            prev = t
        end
    end
end

obs2conds = Dict()
for obs_dict in t_exp
    obs_name = obs_dict[1]
    obs2conds[obs_name] = []
    for cond_dict in obs_dict[2]
        cond_idx = cond_dict[1]
        push!(obs2conds[obs_name], cond_idx)
    end
end

results = Dict()
results["objective_value"] = Dict()
results["parameters"] = Dict()
results["species"] = Dict()
results["observables"] = Dict()

j_to_cond_par = [1, 2]
cond_without_preequ = [1, 2]
cond_with_preequ = []
preequ = []

i_start = 1
    m = Model(with_optimizer(Ipopt.Optimizer))

    # Define global parameters
    println("Defining global parameters...")
    @variable(m, 0 <= b0 <= 10, start=0+(10-0)*rand(Float64))
    @variable(m, 0 <= k1 <= 10, start=0+(10-0)*rand(Float64))
    @variable(m, 0 <= k2 <= 10, start=0+(10-0)*rand(Float64))

    # Define condition-specific parameters
    println("Defining condition-specific parameters...")
    @variable(m, a0[1:2])
    @constraint(m, a0[1] == 0.8)
    @constraint(m, a0[2] == 0.9)

    # Define overrides
    # Define observable overrides
    println("Defining observableParameter overrides...")

    # Define noise overrides
    println("Defining noiseParameter overrides...")

    # Model compartments
    println("Defining compartments...")
    @variable(m, compartment == 1.0, start=1.0)

    # Model species
    println("Defining species...")
    @variable(m, 0.0 <= A[j in 1:2, k in 1:(length(t_sim)+1)] <= 8.8)
    @variable(m, 0.0 <= B[j in 1:2, k in 1:(length(t_sim)+1)] <= 8.8)

    # Model initial assignments
    println("Defining initial assignments...")
    @constraint(m, [j in 1:2], A[cond_without_preequ[j],1] == a0[cond_without_preequ[j]])
    @constraint(m, [j in 1:2], B[cond_without_preequ[j],1] == b0)

    # Model ODEs
    println("Defining ODEs...")
    println("A")
    @NLconstraint(m, [j in 1:2, k in 1:length(t_sim)-1],
        A[j, k+1] == A[j, k] + ( -1.0*( compartment * k1 * A[j, k+1] ) +1.0*( compartment * k2 * B[j, k+1] )     ) * ( t_sim[k+1] - t_sim[k] ) )
    println("B")
    @NLconstraint(m, [j in 1:2, k in 1:length(t_sim)-1],
        B[j, k+1] == B[j, k] + ( +1.0*( compartment * k1 * A[j, k+1] ) -1.0*( compartment * k2 * B[j, k+1] )     ) * ( t_sim[k+1] - t_sim[k] ) )

    # Pre-equilibration constraints
    @constraint(m, [j in 1:2], A[j, length(t_sim)+1] == 0) # Dummy preequilibration
    @constraint(m, [j in 1:2], B[j, length(t_sim)+1] == 0) # Dummy preequilibration

    # Define observables
    println("Defining observables...")
    @variable(m, -0.6000000000000001 <= obs_a[j in obs2conds["obs_a"], k in 1:length(t_exp["obs_a"][j])] <= 1.5, start=1.)
    @NLconstraint(m, [j in obs2conds["obs_a"], k in 1:length(t_exp["obs_a"][j])], obs_a[j, k] == A[j, t_sim_to_exp["obs_a"][j][k]])

    # Define objective
    println("Defining objective...")
    @NLobjective(m, Min, sum(0.5 * log(2*pi*(( 1 ))^2) + 0.5*((obs_a[j, deduplicated_time_idx["obs_a"][j][k]]-m_exp["obs_a"][j][k])/(( 1 )))^2 for j in obs2conds["obs_a"] for k in 1:length(t_exp["obs_a"][j]))
        )

    println("Optimizing:")
    optimize!(m)

    println("Transfering results to Python...")
    parameter_names = [b0, k1, k2]
    parameter_values = Dict()
    for p in parameter_names
        if occursin("[", string(p))
            parameter_values[split(string(p[1]), "[")[1]] = JuMP.value.(p)
        else
            parameter_values[string(p)] = JuMP.value.(p)
        end
    end

    species_names = [A, B, ]
    species_values = Dict()
    for s in species_names
        species_values[split(string(s[1]), "[")[1]] = JuMP.value.(s)
    end

    observable_values_v1 = Dict()
    for o in observable_names
        observable_values_v1[split(string(o[1,1]), "[")[1]] = Dict(key=>val for (key,val) in o.data)
    end

    observable_names = [obs_a, ]
    observable_values_v2 = Dict()
    for o in observable_names
        tmp = []
        results = JuMP.value.(o)
        for val in results
            push!(tmp, val)
        end
        observable_values_v2[split(string(o[1,1]), "[")[1]] = tmp
    end

    objective_val = objective_value(m)

    results["objective_value"][string(i_start)] = objective_val
    results["parameters"][string(i_start)] = parameter_values
    results["species"][string(i_start)] = species_values
    results["observables"][string(i_start)] = observable_values

results

Unfortunately, the structure of the observable_values_v1/2 dictionaries does not match the structure of m_exp.

julia> m_exp
Dict{Any,Any} with 1 entry:
  "obs_a" => Dict{Any,Any}(2=>[0.8, 0.2],1=>[0.7, 0.1])

julia> observable_values_v1
Dict{Any,Any} with 1 entry:
  "obs_a" => Dict((1, 2)=>obs_a[1,2],(2, 2)=>obs_a[2,2],(1, 1)=>obs_a[1,1],(2, 1)=>obs_a[2,1])

julia> observable_values_v2
Dict{Any,Any} with 1 entry:
  "obs_a" => Any[0.143449, 0.16138, 0.8, 0.9]

sshin23 commented 4 years ago

The code below perhaps creates the dict of the wanted format.

observable_values = Dict(name=>Dict(t1=>[value(eval(Symbol(name))[t1,t2]) for t2=1:length(arr)]
                                    for (t1,arr) in dict) for (name,dict) in m_exp)

paulflang commented 3 years ago

Thanks. This does the trick for the most part, except when I set n_starts to > 1. Then the indented block after i_start is enclosed in a for loop. This causes trouble, as eval only accesses the global scope. Can you think of a way to do this without eval? One workaround is to promote variables to the global scope like so:

global obs_a = @variable(m, -0.6000000000000001 <= obs_a[j in obs2conds["obs_a"], k in 1:length(t_exp["obs_a"][j])] <= 1.5, start=1.)

Please let me know if this should be the solution of our choice or if you can think of a better way. Thanks!

sshin23 commented 3 years ago

One another work around is to create a dictionary in the beginning, say

var_dict=Dict()

and let

var["obs_a"] = @variable(m, -0.6000000000000001 <= obs_a[j in obs2conds["obs_a"], k in 1:length(t_exp["obs_a"][j])] <= 1.5, start=1.)

and then replace eval(Symbol(name)) with var[name].

paulflang commented 3 years ago

Perfect. That did the job. I just pushed a new commit that includes your suggestion. Thank you very much!

paulflang / SBML2Julia

Different time vectors for different exp. conditions #9