pinheiroGroup / Kinbiont.jl

Ecosystem of numerical methods for microbial kinetics data analysis, from preprocessing to result interpretation.
https://kinbiont.fuzue.org
MIT License
3 stars 1 forks source link

non linear fit function #4

Closed ang-one closed 8 months ago

ang-one commented 9 months ago

I will add a non-linear fit function useful for non linear fit with function made by the user and multiple scattering correction

ang-one commented 9 months ago

I am adding a part using Non linear fit. I will use this package https://github.com/JuliaNLSolvers/LsqFit.jl

The question is if it is better to change the log lin fit to use it or no

ang-one commented 9 months ago

I created a working prototype of non linear function The tested function are taken from from "Statistical evaluation of mathematical models for microbial growth" not all of them works properly. Needs to be integrated to multiple scattering correction and fitting procedure `

exponential

NL_model_exp(x,p) = p[1] . exp.( .- x . p[2])

Logistic

NL_model_logistic(x,p) = p[1] ./ (1 .+ exp.( .- p[2] .* ( x .- p[3] )))

Gompertz

NL_model_Gompertz(x,p) = p[1] . exp.( .- exp.( .- p[2] . ( x .- p[3] )))

von Bertalanffy

NL_model_Bertalanffy(x,p) = p[1] .+ (p[2] .- p[1]) . (1 .- exp.( .- p[3] . x ) ).^(1. /p[4])

Richards

NL_model_Richards(x,p) = p[1] ./ (1 .+ p[2] . exp.( .- p[3] . ( x .- p[4] ))).^(1. ./ p[5])

Morgan

NL_model_Morgan(x,p) = (p[1] . p[2].^ p[3] .+ p[4] . x .^ p[3]) ./( p[2] .^p[3] .+ x .^ p[3])

Weibull

NL_model_Weibull(x,p) = p[1] .- (p[1] .-p[2]) . exp.( .- (p[3].x).^p[4])

initial guess

p0 = [1.1, 1.1,1.1,1.1,1.1]

formatting data

xdata =data_OD[1,:] ydata = data_OD[2,:]

fitting

fit = LsqFit.curve_fit(NL_model_Richards, data_OD[1,:], data_OD[2,:], p0)

evaluating sigma

sigma_param = stderror(fit)

loss = sum(abs(fit.resid))

sol = NL_model_Richards( data_OD[1,:] , fit.param )

plotting

plot(sol) plot!(data_OD[2,:])

`

ang-one commented 9 months ago

For common model add the comparison between ODE and NL

ang-one commented 9 months ago

It is possible to force the initial conditon for nl fit (ie. block a param to a value) or not?

ang-one commented 9 months ago

Multiple scattering correction with exp fit is working and added https://www.sciencedirect.com/science/article/pii/S0141022918302047

Now i will work on the function to do this fit

ang-one commented 9 months ago

I am testing the prototype for using the same common interface of sciML for ODE and NL fit

` function NL_model_exp(p,times)

model =   p[1] .* exp.( times .* p[2]) 

return model

end

function NL_model_exp_L2(u, data)

# Define an model
model =   u[1] .* exp.(  data[1,:] .* u[2]) 

# number of data
n_data = length(data[1,:] )

# Calculate the residuals
residuals = (model .- data[2,:] )./n_data

# Return the sum of squares of the residuals 
return sum(residuals.^2)

end

times =1:2:440 data_v = NL_model_exp([0.1,0.06],times) data_OD= Matrix(transpose(hcat(times,data_v))) u0 =[0.5 , 0.5] scatter(data_OD[1,:],data_OD[2,:])

prob = OptimizationProblem(NL_model_exp_L2,u0,data_OD, lb = [00.0, 0.0], ub = [1.0, 1.0]) sol = solve(prob, BBO_adaptive_de_rand_1_bin_radiuslimited(),PopulationSize =500,maxit = 1000000000000) sol.objective a = NL_model_exp(sol,data_OD[1,:]) plot!(data_OD[1,:],a) `

ang-one commented 9 months ago

I am testing NL fit

I have completed our function and comparing with the one of the package I have the same performance, but the problems depends strongly on the inizialization, here an example with different inizialization. currently i am gonna test sensistivity with cross validation

Screenshot 2024-02-13 at 14 04 45
ang-one commented 9 months ago

Note that for the moment the CI are not fixed so the problem has always 1 param more than ODE I am also going to implement the fixed CI version

ang-one commented 9 months ago

New non linear fit function integrated with our package, now possible

-single fit

Need to work onto

ang-one commented 9 months ago

For the moment the test seems to show that ODE approach is a lot more stable respect the NL due to the choice of initialisation . I do not full understand the reasons behind this fact I think that is due to NL must infer also initial conditions so the comparison is not total fair, but i will indagate on it

ang-one commented 9 months ago

I think that to evaluate the std from montecarlo (bootstrap) we should use only certain solution that respect some criterion of distance from the top loss.

ang-one commented 8 months ago

example of usage

function NL_model_Richards(p,times)

u = p[1] ./ (1 .+ p[2] .* exp.( .- p[3] .* ( times .- p[4] ))).^(1 ./ p[5])

return u

end

@time B= fit_NL_model(data_OD_smooth2, # dataset first row times second row OD "test", # name of the well "", #label of the experiment NL_model_Richards, # ode model to use nl_lb, # lower bound param nl_ub; # upper bound param u0=[0.01, 0.01,0.0,0.1,0.1],# initial guess param optmizator=BBO_adaptive_de_rand_1_bin_radiuslimited(), display_plots=true, # display plots in julia or not save_plot=false, path_to_plot="NA", # where save plots pt_avg=1, # numebr of the point to generate intial condition pt_smooth_derivative=7, smoothing=false, # the smoothing is done or not? type_of_smoothing="rolling_avg", type_of_loss="RE", # type of used loss blank_array=zeros(100), # data of all blanks multiple_scattering_correction=false, # if true uses the given calibration curve to fix the data method_multiple_scattering_correction="interpolation", calibration_OD_curve="NA", # the path to calibration curve to fix the data PopulationSize=50, maxiters=90000000, abstol=0.00000001, thr_lowess=0.05, )

@time C = fit_NL_mode_with_sensitivity(data_OD_smooth2, # dataset first row times second row OD "", # name of the well "", #label of the experiment NL_model_Richards, # ode model to use nl_lb, # lower bound param nl_ub; # upper bound param nrep=10, optmizator=BBO_adaptive_de_rand_1_bin_radiuslimited(), display_plots=true, # display plots in julia or not save_plot=false, path_to_plot="NA", # where save plots pt_avg=1, # numebr of the point to generate intial condition pt_smooth_derivative=7, smoothing=false, # the smoothing is done or not? type_of_smoothing="rolling_avg", type_of_loss="RE", # type of used loss blank_array=zeros(100), # data of all blanks multiple_scattering_correction=false, # if true uses the given calibration curve to fix the data method_multiple_scattering_correction="interpolation", calibration_OD_curve="NA", # the path to calibration curve to fix the data PopulationSize=50, maxiters=90000000, abstol=0.00000001, thr_lowess=0.05, )

@time D = fit_NL_model_bootstrap(data_OD_smooth2, # dataset first row times second row OD "", # name of the well "", #label of the experiment NL_model_Richards, # ode model to use nl_lb, # lower bound param nl_ub; # upper bound param nrep=50, u0=[00.1, 0.1,0.1,0.1,0.1],# initial guess param optmizator=BBO_adaptive_de_rand_1_bin_radiuslimited(), display_plots=true, # display plots in julia or not save_plot=false, path_to_plot="NA", # where save plots pt_avg=1, # numebr of the point to generate intial condition pt_smooth_derivative=7, smoothing=false, # the smoothing is done or not? type_of_smoothing="rolling_avg", type_of_loss="RE", # type of used loss blank_array=zeros(100), # data of all blanks multiple_scattering_correction=false, # if true uses the given calibration curve to fix the data method_multiple_scattering_correction="interpolation", calibration_OD_curve="NA", # the path to calibration curve to fix the data PopulationSize=50, maxiters=20000000, abstol=0.000000001, thr_lowess=0.05,

ang-one commented 8 months ago

Added the loss selector 4 losses are implemented

     "L2" 
    "RE"
    "L2_log" 
    "RE_log" 
ang-one commented 8 months ago

Now i will work on the models

ang-one commented 8 months ago

defined a model selector fo NL cases, note that in this case the user as model can input a function or a string to call hardcoded models

7 options for models

NL_Model( "NL_exponential", NL_model_exp, ["model", "well", "N0", "growth_rate", "th_max_gr","emp_max_gr","loss"] ), NL_Model( "NL_logistic", NL_model_logistic, ["model", "well", "N_max", "growth_rate", "lag","th_max_gr","emp_max_gr","loss"] ), NL_Model( "NL_Gompertz", NL_model_Gompertz, ["model", "well","N_max", "growth_rate", "lag","th_max_gr","emp_max_gr","loss"] ), NL_Model( "NL_Bertalanffy", NL_model_Bertalanffy, ["model", "well","N_0","N_max", "growth_rate", "shape","th_max_gr","emp_max_gr","loss"] ), NL_Model( "NL_Richards", NL_model_Richards, ["model", "well","N_max","shape", "growth_rate", "lag","th_max_gr","emp_max_gr","loss"] ), NL_Model( "NL_Morgan", NL_model_Morgan, ["model", "well","N_0","K", "shape", "N_max","th_max_gr","emp_max_gr","loss"] ), NL_Model( "NL_Weibull", NL_model_Weibull, ["model", "well","N_max","N_0", "growth_rate", "shape","th_max_gr","emp_max_gr","loss"] ),

ang-one commented 8 months ago

Note that the optimization now support piecewise models. If there is any request i will add

ang-one commented 8 months ago

Now working on the fit one file version of the function some considerations

ang-one commented 8 months ago

Working onto the NL fit for segmentation. Problem: For segment that are not the first 1 I need to maintain the continuity of the fitted solution. Not all models have a relationship of N(t=t_start) The i tried by forcing convergence two first points with a specific loss

function NL_L2_fixed_CI(data,model_function,u, p)

model = model_function(u, data[1, :])
penality_ci = 10^9
n_data = length(data[1, :])
residuals_ci = penality_ci .* (model[1:2] .- data[2, 1:2]) ./ (2)
residuals = (model[3:end] .- data[2, 3:end]) ./ (n_data -2)
residuals_tot = (sum(residuals_ci .^ 2)) + (sum(residuals .^ 2))
return residuals_tot

end

The problems of such approach is that we need to understand a good penality_ci

otherwise the effect will be

Screenshot 2024-02-15 at 10 09 45
ang-one commented 8 months ago

I add here some example of the fits of implemented models using L2 loss

Screenshot 2024-02-15 at 10 40 50 Screenshot 2024-02-15 at 10 40 28 Screenshot 2024-02-15 at 10 40 09 Screenshot 2024-02-15 at 10 39 49 Screenshot 2024-02-15 at 10 39 37 Screenshot 2024-02-15 at 10 39 21
ang-one commented 8 months ago

I add here some example of the fits of implemented models using L2 loss with a penality of 10^3 to the first points of the fit

Screenshot 2024-02-15 at 10 44 36 Screenshot 2024-02-15 at 10 43 35 Screenshot 2024-02-15 at 10 46 51 Screenshot 2024-02-15 at 10 46 32
ang-one commented 8 months ago

Tuning a penality parameter loss total = penality * loss_ci + loss_rest_of_fit somehow one can obtain the desired results

Screenshot 2024-02-15 at 11 35 25 Screenshot 2024-02-15 at 11 35 02 Screenshot 2024-02-15 at 11 34 34 Screenshot 2024-02-15 at 11 34 04
ang-one commented 8 months ago

I added

ang-one commented 8 months ago

example of usage:

M= fit_NL_model_MCMC_intialization(data_OD_smooth2, # dataset first row times second row OD "test", # name of the well "", #label of the experiment "NL_logistic", # ode model to use nl_lb, # lower bound param nl_ub; # upper bound param u0=new_param,# initial guess param nrep =300, optmizator=BBO_adaptive_de_rand_1_bin_radiuslimited(), display_plots=true, # display plots in julia or not save_plot=false, path_to_plot="NA", # where save plots pt_avg=1, # numebr of the point to generate intial condition pt_smooth_derivative=7, smoothing=false, # the smoothing is done or not? type_of_smoothing="rolling_avg", type_of_loss="L2", # type of used loss blank_array=zeros(100), # data of all blanks multiple_scattering_correction=false, # if true uses the given calibration curve to fix the data method_multiple_scattering_correction="interpolation", calibration_OD_curve="NA", # the path to calibration curve to fix the data PopulationSize=50, maxiters=9000000, abstol=0.000000001, thr_lowess=0.05, penality_CI = 8 )

ang-one commented 8 months ago

First tests of a segmentation for non linear fit

nl_ub_1 = [2.0001 , 10.00000001, 500.00] nl_lb_1 = [0.0001 , 0.00000001, 0.00 ]

nl_ub_2 = [2.0001 , 10.00000001, 5.00,5.0] nl_lb_2 = [0.0001 , 0.00000001, 0.00,0.0 ]

nl_ub_3 = [2.0001 , 10.00000001] nl_lb_3 = [0.0001 , 0.00000001]

nl_ub_4 = [2.0001 , 10.00000001, 500.00] nl_lb_4 = [0.0001 , 0.00000001, 0.00 ]

list_models_f = ["NL_Gompertz","NL_Bertalanffy","NL_exponential","NL_Gompertz"] list_lb =[nl_lb_1,nl_lb_2,nl_lb_3,nl_lb_4] list_ub = [nl_ub_1,nl_ub_2,nl_ub_3,nl_ub_4]

R= selection_NL_maxiumum_change_points( data_OD_smooth2, # dataset first row times second row OD "", # name of the well "test", #label of the experiment list_models_f, # ode model to use list_lb, # lower bound param list_ub, # upper bound param 3; type_of_loss="L2_fixed_CI", # type of used loss optmizator=BBO_adaptive_de_rand_1_bin_radiuslimited(), # selection of optimization method method_of_fitting="MCMC", # selection of sciml integrator type_of_detection="sliding_win", type_of_curve="original", smoothing=false, type_of_smoothing="lowess", thr_lowess=0.05, pt_avg=1, nrep=50, save_plot=false, # do plots or no display_plots=true, path_to_plot="NA", # where save plots win_size=14, # numebr of the point to generate intial condition pt_smooth_derivative=0, multiple_scattering_correction=false, # if true uses the given calibration curve to fix the data method_multiple_scattering_correction="interpolation", calibration_OD_curve="NA", # the path to calibration curve to fix the data beta_smoothing_ms=2.0, # parameter of the AIC penality method_peaks_detection="peaks_prominence", n_bins=40, PopulationSize=300, maxiters=2000000, abstol=0.00000001, penality_CI=7.0)

ang-one commented 8 months ago

the fitting for NL is now backward and test all possible permutations of changepoints However performances are not so good

Screenshot 2024-02-15 at 16 55 39
ang-one commented 8 months ago

Performance issue seems correlated to the new AICc calculation

ang-one commented 8 months ago

After AICc fixes seems to work!!!!

Screenshot 2024-02-15 at 18 21 21
ang-one commented 8 months ago

For non linear fitting we miss

ang-one commented 8 months ago

In this case the user enters the maximum number possible change points and we fit with all possible permutation of CPD between 1-nmax (e.g, if cpd =[1,2,3] i fit with [1],[2],[3],[1,2],[1,3],[1,2,3] and choose the best model for AICc) the problem is that we need to add the case 0 change points

ang-one commented 8 months ago

Note that now to choose a fixed number of cpd one can use a control and fit a fixed number of cpd as before

ang-one commented 8 months ago

I can implement the same also for ODE but will be a loooot slower

ang-one commented 8 months ago

Need a control of the tested configuration of segments , there are repetitions?

ang-one commented 8 months ago

Need a control of the tested configuration of segments , there are repetitions?

done, no

ang-one commented 8 months ago

In theory i have implemented the segmentation version for a file so the only stuff missing is the error calculation for NL fit with montecarlo (decide which method is better)

In practice, the perfomances of NL are not satisfactory. I have obtained good results with fit of a single function using MCMC search of parameters initialisation. For segmentation I have bad results. For the moment i am exploring three possibilities

ang-one commented 8 months ago

I am testing segmentation on synthetic data. I generate a curve with a logistic and a gompertz. I add some noise and I started the optimization from the ground truth parameters

The parameters of ther first segment are fully retrived, But the one of the first NO Screenshot 2024-02-19 at 09 34 26

Screenshot 2024-02-19 at 09 32 00

ang-one commented 8 months ago

The stranger stuff is that for NL fit the segmentation goes backward and not forward in time

So great errors in the first part means propagating it. But this point should be indagated

ang-one commented 8 months ago

The code of previous example

model = "logistic" n_start =[0.1] tstart =0.0

tmax = 0100.0 delta_t=2.0 param_of_ode= [0.1,0.2 ] sim_1 = ODE_sim(model, #string of the model n_start, # starting condition tstart, # start time of the sim tmax, # final time of the sim delta_t, # delta t for poisson approx param_of_ode # parameters of the ODE model )

sol_1 =reduce(vcat,sim_1)

second segment ODE

model = "gompertz" n_start =[sol_1[end]] tstart =102.0 tmax = 0200.0 delta_t=2.0 param_of_ode= [0.2,0.5 ]

sim_2= ODE_sim(model, #string of the model n_start, # starting condition tstart, # start time of the sim tmax, # final time of the sim delta_t, # delta t for poisson approx param_of_ode # parameters of the ODE model )

sol_2 =reduce(vcat,sim_2)

third segment ODE

times_sim =vcat(sim_1.t,sim_2.t)

binding the simulatios

sol_sim =vcat(sol_1,sol_2)

Plots.scatter(sol_sim, xlabel="Time", ylabel="Arb. Units", label=["Data " nothing], color=:blue, size = (300,300))

data_OD = Matrix(transpose(hcat(times_sim,sol_sim)))

adding noise_unifo

noise_unifom = rand(Uniform(-0.001,0.001),length(sol_sim))

data_OD[2,:] = data_OD[2,:] .+ noise_unifom Plots.scatter(data_OD[1,:],data_OD[2,:], xlabel="Time", ylabel="Arb. Units", label=["Data " nothing],color=:blue,markersize =2 ,size = (300,300))

testing change point function

L = selection_NL_fixed_interval( data_OD, # dataset first row times second row OD "", # name of the well "", #label of the experiment ["NL_logistic","NL_Gompertz"], # ode models to use [[0.0,0.0,-1.0],[0.0,0.0,0.0]], # upper bound param [[1.0,1.0,1.0],[1.0,1.0,1.0]], [100]; list_u0=[[0.2,0.1,0.0],[0.5,0.2,0.0]],# initial guess param type_of_loss="L2", # type of used loss optmizator=BBO_adaptive_de_rand_1_bin_radiuslimited(), # selection of optimization method method_of_fitting="MCMC", # selection of sciml integrator smoothing=false, nrep =200, type_of_smoothing="lowess", thr_lowess=0.05, pt_avg=1, pt_smooth_derivative=0, multiple_scattering_correction=false, # if true uses the given calibration curve to fix the data method_multiple_scattering_correction="interpolation", calibration_OD_curve="NA", # the path to calibration curve to fix the data beta_smoothing_ms=0.0, # parameter of the AIC penality PopulationSize=500, maxiters=20000000, abstol=0.000000001, penality_CI=80000000.0)

Plots.scatter(data_OD[1,:],data_OD[2,:], xlabel="Time", ylabel="Arb. Units", label=["Data " nothing],color=:blue,markersize =2 ,size = (300,300))

plot!(data_OD[1,:],L[2])
ang-one commented 8 months ago

I can replicate the problem also with other models so it is not intrinc of Gompertz Screenshot 2024-02-19 at 10 18 05

ang-one commented 8 months ago

The problem seems in the first fitting

Screenshot 2024-02-19 at 10 21 48

ang-one commented 8 months ago

I have also tested with more than two segments. I confirm the problem seems to resides in the first interval

Screenshot 2024-02-19 at 15 41 29
ang-one commented 8 months ago

I have printed the solution during the run of the function and it seems that what is plotted for the first segment it is not what it is inferred, i am investigating on this

ang-one commented 8 months ago

I think is solved! two problems found

1) the plotting 2) the upper bound i was using made the fitting impossible

Screenshot 2024-02-19 at 15 56 48
ang-one commented 8 months ago

So Given

It works! Now i proceed by studying how initizialization alters results. I test the sensitivity against initialization

ang-one commented 8 months ago

I have studied the NL sensitivity of a single logistic fit, the procedure done automatically with jmaki functions follows (https://en.wikipedia.org/wiki/Morris_method).

Screenshot 2024-02-19 at 16 16 36 Screenshot 2024-02-19 at 16 16 29 Screenshot 2024-02-19 at 16 16 21
ang-one commented 8 months ago

The results shows that most of the configs returns the perfect value of the parameters but there are some that some times you get a Biiiiig error.

This beahvior technically it is not a problem because the hyper parameter optmization of the start should cure this. I will add an example of the same test but using a MCMC to choose the initilization

ang-one commented 8 months ago

I have tested the sensitivity of the NL fit using only 10 step of MCMC optimization of the initialization of the parameters. ON synth data this seem to solve most of the problems due to initialization

Screenshot 2024-02-19 at 16 47 50 Screenshot 2024-02-19 at 16 47 43 Screenshot 2024-02-19 at 16 47 35
ang-one commented 8 months ago

Summarising

I am going to test

ang-one commented 8 months ago

Some examples, the behavior is improved by tuning a little the procedure, some hyperparam should be optimized a bit (e.g., how permute cp, AICc, wheight of the bounduary in the loss)

Screenshot 2024-02-19 at 17 08 40 Screenshot 2024-02-19 at 17 07 26 Screenshot 2024-02-19 at 17 07 12
ang-one commented 8 months ago

There is some strange behavior of the change points selection I open a new issue on that, the test here will be done with a fixed number of cps

ang-one commented 8 months ago

Comparison fixed number of cp with different mcmc steps for each segmenet initiliazation nrep = 20 nrep = 50

Screenshot 2024-02-19 at 17 36 40 Screenshot 2024-02-19 at 17 33 51