Suggestion: standardize parameter retrieval / supply interface

langestefan commented 1 month ago

To make the package even better I want to make the suggestion to standardize the way you retrieve parameters and supply them to constructors. This standardization should be independent of the scale of the parameters, be it problem or inference space.

To illustrate why this is useful, I am currently trying to achieve the following.

I have an optimisation result and the original problem:

result::PEtabOptimisationResult
problem::PEtabODEProblem

Now I want to achieve the following:

get xmin from the result
save xmin somewhere in its problem scale, so linear
take xmin and compute problem.nllh, problem.FIM etc.
make a slight change in xmin and test the effect on the outputs

How to achieve this?

I go to the API and see what's available. I can get result.xmin and pass it directly to problem.nllh and it works, great. But xmin is the in the inference scale, I want to save the parameters in linear scale. Well there is no method to do that, but I can do: xmin_lin = PEtab.get_ps(result, problem). But this method is missing the noise parameters? I want to save those too.

I can get the noise parameters from xmin however, and combine them with the linear parameters:

param_out = Dict(PEtab.get_ps(result, problem))
xmin = Dict(pairs(NamedTuple(result.xmin)))
param_out[:sigma_T_i] = xmin[:sigma_T_i]
param_out[:sigma_T_h] = xmin[:sigma_T_h]
return DataFrame(param_out)

But this is really ugly. If I want to remake the problem, I have to supply:

remake(prob::PEtabODEProblem, xchange::Dict)

But where to get this dict? Also why is it a dict, why can't I pass my ComponentArray? Yeah okay, you can hardcode it, but for larger models this is not doable.

Lastly, I can use the very convenient get_x to get my parameters on linear scale and in the right order. But to get my inference result in this format, I need to remake it first with xmin. But if I do that, the result from remake is missing all of the parameters.

new_prob = remake(problem, xmin)
FIM = problem.FIM(get_x(new_prob))

So this fails with KeyError: key :log10_C_e not found

I think instead of passing results, dicts, componentarrays and parameter vectors around I think it would improve usability a lot if we had one single definition of a parameter object. So that each method takes the exact same object, knows at what scale (or is specified by the user) it is so that if it needs to put it in inference scale it can do that internally. It should not have to rely on the order of the parameter, or whether it's called log10_C_e instead of C_e.

sebapersson commented 1 month ago

Thanks, this is really good input and something I been thinking about myself!

As a clarification, remake is a bit of black-magic and is mostly for model-selection with the aim of having to not rebuilt the problem (should ideally very seldom have to be used), which I should make clearer in the docs.

As of your thoughts:

I agree the aim should be to have a single interface to as large extent as possible. To this end, I favor ComponentArray as it will also make it possible to add support for SciML (e.g. UDE models with both neural networks and mechanistic terms) in a manageable way. Therefore, it would also definitely make sense to have get_u0 etc..., return ComponentArray as the default.
As for the order of parameters. Ideally, this should not matter. However, this is harder said than done. Internally PEtab.jl requires parameters to be in a certain order to do all the indexing correctly, and also some tools PEtab is a backend for like pyPESTO expects a certain order. Further, supporting arbitrary order for computing the nllh and grad is easy when it comes to mapping between the PEtab order and input order. However, for Hessian this is tricky, because then we have to map the PEtab Hessian to the input order (and this mapping for an arbitrary order will incur quite a bit of overhead per Hessian evaluation). But, now if ComponentArray is the standard parameter struct then order should not really matter that much right? Because in a ComponentArray it is easy to change the value of certain parameters, just x.name = val. Moreover, the ODE interacting functions (get_ps etc...), they will with ComponentArray have to return the parameters in a certain order, so I think there is a point in enforcing that ordering matters.
I get your point on parameter scale (and how it impacts the naming). But I think the current approach should be the least confusing, because now the scale is very transparent (and I hope it is clear which scale to provide custom value on). From above, I think maybe a lot of your problems could be resolved by having a get_xmin that in a similar fashion to get_x can return parameters on linear and transformed scales, and by also having a function that can transform between linear and parameter scale?

What do you think?

Also, @TorkelE you might be interested in this if you have time.

langestefan commented 1 month ago

Thanks for your reply, it makes more sense now.

But, now if ComponentArray is the standard parameter struct then order should not really matter that much right? Because in a ComponentArray it is easy to change the value of certain parameters, just x.name = val. Moreover, the ODE interacting functions (get_ps etc...), they will with ComponentArray have to return the parameters in a certain order, so I think there is a point in enforcing that ordering matters.

Yes completely agree. But functions like nllh, FIM will still fail if your ComponentArray doesn't have the right order. But what's more inconvenient (as a user) is that the scale is part of the variable name. Ideally, I would just like every parameter to be a tuple(name, value, scale). That way I don't have to inspect names to keep track of parameter scales and it should be very clear to PEtab what values I am providing it.

3. I get your point on parameter scale (and how it impacts the naming). But I think the current approach should be the least confusing, because now the scale is very transparent (and I hope it is clear which scale to provide custom value on). From above, I think maybe a lot of your problems could be resolved by  having  a `get_xmin` that in a similar fashion to `get_x` can return parameters on linear and transformed scales, and by also having a function that can transform between linear and parameter scale?

Yes for sure that would help a lot.

sebapersson commented 1 month ago

But functions like nllh, FIM will still fail if your ComponentArray doesn't have the right order.

I think the best solution here would just to add an order check in these function. And if the order is wrong, the error message can refer to an ordering function.

Ideally, I would just like every parameter to be a tuple(name, value, scale).

I think this is a valid point, but I also see this getting very complicated for the future when for example adding neural network support in PEtab (which definitely will happen within this year). Maybe what is needed is a set function, like set_x(x, :name, val) where you as the user can set values on the linear scale (and the function takes care of fixing everything to the correct scale)?

langestefan commented 1 month ago

I think the best solution here would just to add an order check in these function. And if the order is wrong, the error message can refer to an ordering function.

Would it not be an option to handle that internally? The component array already has the axis names and the values. So technically it should be able to automatically infer the right order without the user having to specify it?

If I compare it to using plain MTK, I specify:

@parameters A B C D
params = [A, B, C, D]
values = [1, 2, 3, 4]
parammap = Dict(params .=> values)

And ODEProblem constructor accepts parammap as an argument. Whatever MTK does with the order internally I dont care about.

sebapersson commented 1 month ago

Would it not be an option to handle that internally?

Fair point, and PEtabODEProblem can internally build an index for efficient mapping (which is only rebuilt if the input changes). So this is doable. And for parameter estimation the input is behind the scenes Vector anyhow so this will not affect performance.

Then the only thing left is the get_ps functions. If the goal is not to have them depend on order, I guess it should make more sense for them to return a map as is currently done.

Lastly, do you think a set_val or similar function would be useful for setting value without worrying about parameter scale?

sebapersson / PEtab.jl

Suggestion: standardize parameter retrieval / supply interface #241