Closed ang-one closed 8 months ago
I am adding a part using Non linear fit. I will use this package https://github.com/JuliaNLSolvers/LsqFit.jl
The question is if it is better to change the log lin fit to use it or no
I created a working prototype of non linear function The tested function are taken from from "Statistical evaluation of mathematical models for microbial growth" not all of them works properly. Needs to be integrated to multiple scattering correction and fitting procedure `
NL_model_exp(x,p) = p[1] . exp.( .- x . p[2])
NL_model_logistic(x,p) = p[1] ./ (1 .+ exp.( .- p[2] .* ( x .- p[3] )))
NL_model_Gompertz(x,p) = p[1] . exp.( .- exp.( .- p[2] . ( x .- p[3] )))
NL_model_Bertalanffy(x,p) = p[1] .+ (p[2] .- p[1]) . (1 .- exp.( .- p[3] . x ) ).^(1. /p[4])
NL_model_Richards(x,p) = p[1] ./ (1 .+ p[2] . exp.( .- p[3] . ( x .- p[4] ))).^(1. ./ p[5])
NL_model_Morgan(x,p) = (p[1] . p[2].^ p[3] .+ p[4] . x .^ p[3]) ./( p[2] .^p[3] .+ x .^ p[3])
NL_model_Weibull(x,p) = p[1] .- (p[1] .-p[2]) . exp.( .- (p[3].x).^p[4])
p0 = [1.1, 1.1,1.1,1.1,1.1]
xdata =data_OD[1,:] ydata = data_OD[2,:]
fit = LsqFit.curve_fit(NL_model_Richards, data_OD[1,:], data_OD[2,:], p0)
sigma_param = stderror(fit)
loss = sum(abs(fit.resid))
sol = NL_model_Richards( data_OD[1,:] , fit.param )
plot(sol) plot!(data_OD[2,:])
`
For common model add the comparison between ODE and NL
It is possible to force the initial conditon for nl fit (ie. block a param to a value) or not?
Multiple scattering correction with exp fit is working and added https://www.sciencedirect.com/science/article/pii/S0141022918302047
Now i will work on the function to do this fit
I am testing the prototype for using the same common interface of sciML for ODE and NL fit
` function NL_model_exp(p,times)
model = p[1] .* exp.( times .* p[2])
return model
end
function NL_model_exp_L2(u, data)
# Define an model
model = u[1] .* exp.( data[1,:] .* u[2])
# number of data
n_data = length(data[1,:] )
# Calculate the residuals
residuals = (model .- data[2,:] )./n_data
# Return the sum of squares of the residuals
return sum(residuals.^2)
end
times =1:2:440 data_v = NL_model_exp([0.1,0.06],times) data_OD= Matrix(transpose(hcat(times,data_v))) u0 =[0.5 , 0.5] scatter(data_OD[1,:],data_OD[2,:])
prob = OptimizationProblem(NL_model_exp_L2,u0,data_OD, lb = [00.0, 0.0], ub = [1.0, 1.0]) sol = solve(prob, BBO_adaptive_de_rand_1_bin_radiuslimited(),PopulationSize =500,maxit = 1000000000000) sol.objective a = NL_model_exp(sol,data_OD[1,:]) plot!(data_OD[1,:],a) `
I am testing NL fit
I have completed our function and comparing with the one of the package I have the same performance, but the problems depends strongly on the inizialization, here an example with different inizialization. currently i am gonna test sensistivity with cross validation
Note that for the moment the CI are not fixed so the problem has always 1 param more than ODE I am also going to implement the fixed CI version
New non linear fit function integrated with our package, now possible
-single fit
Need to work onto
For the moment the test seems to show that ODE approach is a lot more stable respect the NL due to the choice of initialisation . I do not full understand the reasons behind this fact I think that is due to NL must infer also initial conditions so the comparison is not total fair, but i will indagate on it
I think that to evaluate the std from montecarlo (bootstrap) we should use only certain solution that respect some criterion of distance from the top loss.
example of usage
function NL_model_Richards(p,times)
u = p[1] ./ (1 .+ p[2] .* exp.( .- p[3] .* ( times .- p[4] ))).^(1 ./ p[5])
return u
end
@time B= fit_NL_model(data_OD_smooth2, # dataset first row times second row OD "test", # name of the well "", #label of the experiment NL_model_Richards, # ode model to use nl_lb, # lower bound param nl_ub; # upper bound param u0=[0.01, 0.01,0.0,0.1,0.1],# initial guess param optmizator=BBO_adaptive_de_rand_1_bin_radiuslimited(), display_plots=true, # display plots in julia or not save_plot=false, path_to_plot="NA", # where save plots pt_avg=1, # numebr of the point to generate intial condition pt_smooth_derivative=7, smoothing=false, # the smoothing is done or not? type_of_smoothing="rolling_avg", type_of_loss="RE", # type of used loss blank_array=zeros(100), # data of all blanks multiple_scattering_correction=false, # if true uses the given calibration curve to fix the data method_multiple_scattering_correction="interpolation", calibration_OD_curve="NA", # the path to calibration curve to fix the data PopulationSize=50, maxiters=90000000, abstol=0.00000001, thr_lowess=0.05, )
@time C = fit_NL_mode_with_sensitivity(data_OD_smooth2, # dataset first row times second row OD "", # name of the well "", #label of the experiment NL_model_Richards, # ode model to use nl_lb, # lower bound param nl_ub; # upper bound param nrep=10, optmizator=BBO_adaptive_de_rand_1_bin_radiuslimited(), display_plots=true, # display plots in julia or not save_plot=false, path_to_plot="NA", # where save plots pt_avg=1, # numebr of the point to generate intial condition pt_smooth_derivative=7, smoothing=false, # the smoothing is done or not? type_of_smoothing="rolling_avg", type_of_loss="RE", # type of used loss blank_array=zeros(100), # data of all blanks multiple_scattering_correction=false, # if true uses the given calibration curve to fix the data method_multiple_scattering_correction="interpolation", calibration_OD_curve="NA", # the path to calibration curve to fix the data PopulationSize=50, maxiters=90000000, abstol=0.00000001, thr_lowess=0.05, )
@time D = fit_NL_model_bootstrap(data_OD_smooth2, # dataset first row times second row OD "", # name of the well "", #label of the experiment NL_model_Richards, # ode model to use nl_lb, # lower bound param nl_ub; # upper bound param nrep=50, u0=[00.1, 0.1,0.1,0.1,0.1],# initial guess param optmizator=BBO_adaptive_de_rand_1_bin_radiuslimited(), display_plots=true, # display plots in julia or not save_plot=false, path_to_plot="NA", # where save plots pt_avg=1, # numebr of the point to generate intial condition pt_smooth_derivative=7, smoothing=false, # the smoothing is done or not? type_of_smoothing="rolling_avg", type_of_loss="RE", # type of used loss blank_array=zeros(100), # data of all blanks multiple_scattering_correction=false, # if true uses the given calibration curve to fix the data method_multiple_scattering_correction="interpolation", calibration_OD_curve="NA", # the path to calibration curve to fix the data PopulationSize=50, maxiters=20000000, abstol=0.000000001, thr_lowess=0.05,
Added the loss selector 4 losses are implemented
"L2"
"RE"
"L2_log"
"RE_log"
Now i will work on the models
defined a model selector fo NL cases, note that in this case the user as model can input a function or a string to call hardcoded models
7 options for models
NL_Model( "NL_exponential", NL_model_exp, ["model", "well", "N0", "growth_rate", "th_max_gr","emp_max_gr","loss"] ), NL_Model( "NL_logistic", NL_model_logistic, ["model", "well", "N_max", "growth_rate", "lag","th_max_gr","emp_max_gr","loss"] ), NL_Model( "NL_Gompertz", NL_model_Gompertz, ["model", "well","N_max", "growth_rate", "lag","th_max_gr","emp_max_gr","loss"] ), NL_Model( "NL_Bertalanffy", NL_model_Bertalanffy, ["model", "well","N_0","N_max", "growth_rate", "shape","th_max_gr","emp_max_gr","loss"] ), NL_Model( "NL_Richards", NL_model_Richards, ["model", "well","N_max","shape", "growth_rate", "lag","th_max_gr","emp_max_gr","loss"] ), NL_Model( "NL_Morgan", NL_model_Morgan, ["model", "well","N_0","K", "shape", "N_max","th_max_gr","emp_max_gr","loss"] ), NL_Model( "NL_Weibull", NL_model_Weibull, ["model", "well","N_max","N_0", "growth_rate", "shape","th_max_gr","emp_max_gr","loss"] ),
Note that the optimization now support piecewise models. If there is any request i will add
Now working on the fit one file version of the function some considerations
Working onto the NL fit for segmentation. Problem: For segment that are not the first 1 I need to maintain the continuity of the fitted solution. Not all models have a relationship of N(t=t_start) The i tried by forcing convergence two first points with a specific loss
function NL_L2_fixed_CI(data,model_function,u, p)
model = model_function(u, data[1, :])
penality_ci = 10^9
n_data = length(data[1, :])
residuals_ci = penality_ci .* (model[1:2] .- data[2, 1:2]) ./ (2)
residuals = (model[3:end] .- data[2, 3:end]) ./ (n_data -2)
residuals_tot = (sum(residuals_ci .^ 2)) + (sum(residuals .^ 2))
return residuals_tot
end
The problems of such approach is that we need to understand a good penality_ci
otherwise the effect will be
I add here some example of the fits of implemented models using L2 loss
I add here some example of the fits of implemented models using L2 loss with a penality of 10^3 to the first points of the fit
Tuning a penality parameter loss total = penality * loss_ci + loss_rest_of_fit somehow one can obtain the desired results
I added
example of usage:
M= fit_NL_model_MCMC_intialization(data_OD_smooth2, # dataset first row times second row OD "test", # name of the well "", #label of the experiment "NL_logistic", # ode model to use nl_lb, # lower bound param nl_ub; # upper bound param u0=new_param,# initial guess param nrep =300, optmizator=BBO_adaptive_de_rand_1_bin_radiuslimited(), display_plots=true, # display plots in julia or not save_plot=false, path_to_plot="NA", # where save plots pt_avg=1, # numebr of the point to generate intial condition pt_smooth_derivative=7, smoothing=false, # the smoothing is done or not? type_of_smoothing="rolling_avg", type_of_loss="L2", # type of used loss blank_array=zeros(100), # data of all blanks multiple_scattering_correction=false, # if true uses the given calibration curve to fix the data method_multiple_scattering_correction="interpolation", calibration_OD_curve="NA", # the path to calibration curve to fix the data PopulationSize=50, maxiters=9000000, abstol=0.000000001, thr_lowess=0.05, penality_CI = 8 )
First tests of a segmentation for non linear fit
nl_ub_1 = [2.0001 , 10.00000001, 500.00] nl_lb_1 = [0.0001 , 0.00000001, 0.00 ]
nl_ub_2 = [2.0001 , 10.00000001, 5.00,5.0] nl_lb_2 = [0.0001 , 0.00000001, 0.00,0.0 ]
nl_ub_3 = [2.0001 , 10.00000001] nl_lb_3 = [0.0001 , 0.00000001]
nl_ub_4 = [2.0001 , 10.00000001, 500.00] nl_lb_4 = [0.0001 , 0.00000001, 0.00 ]
list_models_f = ["NL_Gompertz","NL_Bertalanffy","NL_exponential","NL_Gompertz"] list_lb =[nl_lb_1,nl_lb_2,nl_lb_3,nl_lb_4] list_ub = [nl_ub_1,nl_ub_2,nl_ub_3,nl_ub_4]
R= selection_NL_maxiumum_change_points( data_OD_smooth2, # dataset first row times second row OD "", # name of the well "test", #label of the experiment list_models_f, # ode model to use list_lb, # lower bound param list_ub, # upper bound param 3; type_of_loss="L2_fixed_CI", # type of used loss optmizator=BBO_adaptive_de_rand_1_bin_radiuslimited(), # selection of optimization method method_of_fitting="MCMC", # selection of sciml integrator type_of_detection="sliding_win", type_of_curve="original", smoothing=false, type_of_smoothing="lowess", thr_lowess=0.05, pt_avg=1, nrep=50, save_plot=false, # do plots or no display_plots=true, path_to_plot="NA", # where save plots win_size=14, # numebr of the point to generate intial condition pt_smooth_derivative=0, multiple_scattering_correction=false, # if true uses the given calibration curve to fix the data method_multiple_scattering_correction="interpolation", calibration_OD_curve="NA", # the path to calibration curve to fix the data beta_smoothing_ms=2.0, # parameter of the AIC penality method_peaks_detection="peaks_prominence", n_bins=40, PopulationSize=300, maxiters=2000000, abstol=0.00000001, penality_CI=7.0)
the fitting for NL is now backward and test all possible permutations of changepoints However performances are not so good
Performance issue seems correlated to the new AICc calculation
After AICc fixes seems to work!!!!
For non linear fitting we miss
In this case the user enters the maximum number possible change points and we fit with all possible permutation of CPD between 1-nmax (e.g, if cpd =[1,2,3] i fit with [1],[2],[3],[1,2],[1,3],[1,2,3] and choose the best model for AICc) the problem is that we need to add the case 0 change points
Note that now to choose a fixed number of cpd one can use a control and fit a fixed number of cpd as before
I can implement the same also for ODE but will be a loooot slower
Need a control of the tested configuration of segments , there are repetitions?
Need a control of the tested configuration of segments , there are repetitions?
done, no
In theory i have implemented the segmentation version for a file so the only stuff missing is the error calculation for NL fit with montecarlo (decide which method is better)
In practice, the perfomances of NL are not satisfactory. I have obtained good results with fit of a single function using MCMC search of parameters initialisation. For segmentation I have bad results. For the moment i am exploring three possibilities
I am testing segmentation on synthetic data. I generate a curve with a logistic and a gompertz. I add some noise and I started the optimization from the ground truth parameters
The parameters of ther first segment are fully retrived, But the one of the first NO
The stranger stuff is that for NL fit the segmentation goes backward and not forward in time
So great errors in the first part means propagating it. But this point should be indagated
The code of previous example
model = "logistic" n_start =[0.1] tstart =0.0
tmax = 0100.0 delta_t=2.0 param_of_ode= [0.1,0.2 ] sim_1 = ODE_sim(model, #string of the model n_start, # starting condition tstart, # start time of the sim tmax, # final time of the sim delta_t, # delta t for poisson approx param_of_ode # parameters of the ODE model )
sol_1 =reduce(vcat,sim_1)
model = "gompertz" n_start =[sol_1[end]] tstart =102.0 tmax = 0200.0 delta_t=2.0 param_of_ode= [0.2,0.5 ]
sim_2= ODE_sim(model, #string of the model n_start, # starting condition tstart, # start time of the sim tmax, # final time of the sim delta_t, # delta t for poisson approx param_of_ode # parameters of the ODE model )
sol_2 =reduce(vcat,sim_2)
times_sim =vcat(sim_1.t,sim_2.t)
sol_sim =vcat(sol_1,sol_2)
Plots.scatter(sol_sim, xlabel="Time", ylabel="Arb. Units", label=["Data " nothing], color=:blue, size = (300,300))
data_OD = Matrix(transpose(hcat(times_sim,sol_sim)))
noise_unifom = rand(Uniform(-0.001,0.001),length(sol_sim))
data_OD[2,:] = data_OD[2,:] .+ noise_unifom Plots.scatter(data_OD[1,:],data_OD[2,:], xlabel="Time", ylabel="Arb. Units", label=["Data " nothing],color=:blue,markersize =2 ,size = (300,300))
L = selection_NL_fixed_interval( data_OD, # dataset first row times second row OD "", # name of the well "", #label of the experiment ["NL_logistic","NL_Gompertz"], # ode models to use [[0.0,0.0,-1.0],[0.0,0.0,0.0]], # upper bound param [[1.0,1.0,1.0],[1.0,1.0,1.0]], [100]; list_u0=[[0.2,0.1,0.0],[0.5,0.2,0.0]],# initial guess param type_of_loss="L2", # type of used loss optmizator=BBO_adaptive_de_rand_1_bin_radiuslimited(), # selection of optimization method method_of_fitting="MCMC", # selection of sciml integrator smoothing=false, nrep =200, type_of_smoothing="lowess", thr_lowess=0.05, pt_avg=1, pt_smooth_derivative=0, multiple_scattering_correction=false, # if true uses the given calibration curve to fix the data method_multiple_scattering_correction="interpolation", calibration_OD_curve="NA", # the path to calibration curve to fix the data beta_smoothing_ms=0.0, # parameter of the AIC penality PopulationSize=500, maxiters=20000000, abstol=0.000000001, penality_CI=80000000.0)
Plots.scatter(data_OD[1,:],data_OD[2,:], xlabel="Time", ylabel="Arb. Units", label=["Data " nothing],color=:blue,markersize =2 ,size = (300,300))
plot!(data_OD[1,:],L[2])
I can replicate the problem also with other models so it is not intrinc of Gompertz
The problem seems in the first fitting
I have also tested with more than two segments. I confirm the problem seems to resides in the first interval
I have printed the solution during the run of the function and it seems that what is plotted for the first segment it is not what it is inferred, i am investigating on this
I think is solved! two problems found
1) the plotting 2) the upper bound i was using made the fitting impossible
So Given
It works! Now i proceed by studying how initizialization alters results. I test the sensitivity against initialization
I have studied the NL sensitivity of a single logistic fit, the procedure done automatically with jmaki functions follows (https://en.wikipedia.org/wiki/Morris_method).
The results shows that most of the configs returns the perfect value of the parameters but there are some that some times you get a Biiiiig error.
This beahvior technically it is not a problem because the hyper parameter optmization of the start should cure this. I will add an example of the same test but using a MCMC to choose the initilization
I have tested the sensitivity of the NL fit using only 10 step of MCMC optimization of the initialization of the parameters. ON synth data this seem to solve most of the problems due to initialization
Summarising
I am going to test
Some examples, the behavior is improved by tuning a little the procedure, some hyperparam should be optimized a bit (e.g., how permute cp, AICc, wheight of the bounduary in the loss)
There is some strange behavior of the change points selection I open a new issue on that, the test here will be done with a fixed number of cps
Comparison fixed number of cp with different mcmc steps for each segmenet initiliazation nrep = 20 nrep = 50
I will add a non-linear fit function useful for non linear fit with function made by the user and multiple scattering correction