ang-one commented 9 months ago

I will add a non-linear fit function useful for non linear fit with function made by the user and multiple scattering correction

ang-one commented 9 months ago

I am adding a part using Non linear fit. I will use this package https://github.com/JuliaNLSolvers/LsqFit.jl

The question is if it is better to change the log lin fit to use it or no

ang-one commented 9 months ago

I created a working prototype of non linear function The tested function are taken from from "Statistical evaluation of mathematical models for microbial growth" not all of them works properly. Needs to be integrated to multiple scattering correction and fitting procedure `

exponential

NL_model_exp(x,p) = p[1] . exp.( .- x . p[2])

Logistic

NL_model_logistic(x,p) = p[1] ./ (1 .+ exp.( .- p[2] .* ( x .- p[3] )))

Gompertz

NL_model_Gompertz(x,p) = p[1] . exp.( .- exp.( .- p[2] . ( x .- p[3] )))

von Bertalanffy

NL_model_Bertalanffy(x,p) = p[1] .+ (p[2] .- p[1]) . (1 .- exp.( .- p[3] . x ) ).^(1. /p[4])

Richards

NL_model_Richards(x,p) = p[1] ./ (1 .+ p[2] . exp.( .- p[3] . ( x .- p[4] ))).^(1. ./ p[5])

Morgan

NL_model_Morgan(x,p) = (p[1] . p[2].^ p[3] .+ p[4] . x .^ p[3]) ./( p[2] .^p[3] .+ x .^ p[3])

Weibull

NL_model_Weibull(x,p) = p[1] .- (p[1] .-p[2]) . exp.( .- (p[3].x).^p[4])

initial guess

p0 = [1.1, 1.1,1.1,1.1,1.1]

formatting data

xdata =data_OD[1,:] ydata = data_OD[2,:]

fitting

fit = LsqFit.curve_fit(NL_model_Richards, data_OD[1,:], data_OD[2,:], p0)

evaluating sigma

sigma_param = stderror(fit)

loss = sum(abs(fit.resid))

sol = NL_model_Richards( data_OD[1,:] , fit.param )

plotting

plot(sol) plot!(data_OD[2,:])

`

ang-one commented 9 months ago

For common model add the comparison between ODE and NL

ang-one commented 9 months ago

It is possible to force the initial conditon for nl fit (ie. block a param to a value) or not?

ang-one commented 9 months ago

Multiple scattering correction with exp fit is working and added https://www.sciencedirect.com/science/article/pii/S0141022918302047

Now i will work on the function to do this fit

ang-one commented 9 months ago

I am testing the prototype for using the same common interface of sciML for ODE and NL fit

` function NL_model_exp(p,times)

model =   p[1] .* exp.( times .* p[2]) 

return model

end

function NL_model_exp_L2(u, data)

# Define an model
model =   u[1] .* exp.(  data[1,:] .* u[2]) 

# number of data
n_data = length(data[1,:] )

# Calculate the residuals
residuals = (model .- data[2,:] )./n_data

# Return the sum of squares of the residuals 
return sum(residuals.^2)

end

times =1:2:440 data_v = NL_model_exp([0.1,0.06],times) data_OD= Matrix(transpose(hcat(times,data_v))) u0 =[0.5 , 0.5] scatter(data_OD[1,:],data_OD[2,:])

prob = OptimizationProblem(NL_model_exp_L2,u0,data_OD, lb = [00.0, 0.0], ub = [1.0, 1.0]) sol = solve(prob, BBO_adaptive_de_rand_1_bin_radiuslimited(),PopulationSize =500,maxit = 1000000000000) sol.objective a = NL_model_exp(sol,data_OD[1,:]) plot!(data_OD[1,:],a) `

ang-one commented 9 months ago

I am testing NL fit

I have completed our function and comparing with the one of the package I have the same performance, but the problems depends strongly on the inizialization, here an example with different inizialization. currently i am gonna test sensistivity with cross validation

ang-one commented 9 months ago

Note that for the moment the CI are not fixed so the problem has always 1 param more than ODE I am also going to implement the fixed CI version

ang-one commented 9 months ago

New non linear fit function integrated with our package, now possible

-single fit

sensitivity analysis to the the hyper space of initialisation of parameters and choose the best initialisation
Bootstrap version to evaluate the confidence intervals of fitting

Need to work onto

file wrapper
loss function selector
model selector
version of function to fit fixing intial values
segmentation (due to the speed of the fitting it is possible to choose the peak that maximize the fit)

ang-one commented 9 months ago

For the moment the test seems to show that ODE approach is a lot more stable respect the NL due to the choice of initialisation . I do not full understand the reasons behind this fact I think that is due to NL must infer also initial conditions so the comparison is not total fair, but i will indagate on it

ang-one commented 9 months ago

I think that to evaluate the std from montecarlo (bootstrap) we should use only certain solution that respect some criterion of distance from the top loss.

ang-one commented 8 months ago

example of usage

function NL_model_Richards(p,times)

u = p[1] ./ (1 .+ p[2] .* exp.( .- p[3] .* ( times .- p[4] ))).^(1 ./ p[5])

return u

end

@time B= fit_NL_model(data_OD_smooth2, # dataset first row times second row OD "test", # name of the well "", #label of the experiment NL_model_Richards, # ode model to use nl_lb, # lower bound param nl_ub; # upper bound param u0=[0.01, 0.01,0.0,0.1,0.1],# initial guess param optmizator=BBO_adaptive_de_rand_1_bin_radiuslimited(), display_plots=true, # display plots in julia or not save_plot=false, path_to_plot="NA", # where save plots pt_avg=1, # numebr of the point to generate intial condition pt_smooth_derivative=7, smoothing=false, # the smoothing is done or not? type_of_smoothing="rolling_avg", type_of_loss="RE", # type of used loss blank_array=zeros(100), # data of all blanks multiple_scattering_correction=false, # if true uses the given calibration curve to fix the data method_multiple_scattering_correction="interpolation", calibration_OD_curve="NA", # the path to calibration curve to fix the data PopulationSize=50, maxiters=90000000, abstol=0.00000001, thr_lowess=0.05, )

@time C = fit_NL_mode_with_sensitivity(data_OD_smooth2, # dataset first row times second row OD "", # name of the well "", #label of the experiment NL_model_Richards, # ode model to use nl_lb, # lower bound param nl_ub; # upper bound param nrep=10, optmizator=BBO_adaptive_de_rand_1_bin_radiuslimited(), display_plots=true, # display plots in julia or not save_plot=false, path_to_plot="NA", # where save plots pt_avg=1, # numebr of the point to generate intial condition pt_smooth_derivative=7, smoothing=false, # the smoothing is done or not? type_of_smoothing="rolling_avg", type_of_loss="RE", # type of used loss blank_array=zeros(100), # data of all blanks multiple_scattering_correction=false, # if true uses the given calibration curve to fix the data method_multiple_scattering_correction="interpolation", calibration_OD_curve="NA", # the path to calibration curve to fix the data PopulationSize=50, maxiters=90000000, abstol=0.00000001, thr_lowess=0.05, )

@time D = fit_NL_model_bootstrap(data_OD_smooth2, # dataset first row times second row OD "", # name of the well "", #label of the experiment NL_model_Richards, # ode model to use nl_lb, # lower bound param nl_ub; # upper bound param nrep=50, u0=[00.1, 0.1,0.1,0.1,0.1],# initial guess param optmizator=BBO_adaptive_de_rand_1_bin_radiuslimited(), display_plots=true, # display plots in julia or not save_plot=false, path_to_plot="NA", # where save plots pt_avg=1, # numebr of the point to generate intial condition pt_smooth_derivative=7, smoothing=false, # the smoothing is done or not? type_of_smoothing="rolling_avg", type_of_loss="RE", # type of used loss blank_array=zeros(100), # data of all blanks multiple_scattering_correction=false, # if true uses the given calibration curve to fix the data method_multiple_scattering_correction="interpolation", calibration_OD_curve="NA", # the path to calibration curve to fix the data PopulationSize=50, maxiters=20000000, abstol=0.000000001, thr_lowess=0.05,

ang-one commented 8 months ago

Added the loss selector 4 losses are implemented

     "L2" 
    "RE"
    "L2_log" 
    "RE_log"

ang-one commented 8 months ago

Now i will work on the models

ang-one commented 8 months ago

defined a model selector fo NL cases, note that in this case the user as model can input a function or a string to call hardcoded models

7 options for models

NL_Model( "NL_exponential", NL_model_exp, ["model", "well", "N0", "growth_rate", "th_max_gr","emp_max_gr","loss"] ), NL_Model( "NL_logistic", NL_model_logistic, ["model", "well", "N_max", "growth_rate", "lag","th_max_gr","emp_max_gr","loss"] ), NL_Model( "NL_Gompertz", NL_model_Gompertz, ["model", "well","N_max", "growth_rate", "lag","th_max_gr","emp_max_gr","loss"] ), NL_Model( "NL_Bertalanffy", NL_model_Bertalanffy, ["model", "well","N_0","N_max", "growth_rate", "shape","th_max_gr","emp_max_gr","loss"] ), NL_Model( "NL_Richards", NL_model_Richards, ["model", "well","N_max","shape", "growth_rate", "lag","th_max_gr","emp_max_gr","loss"] ), NL_Model( "NL_Morgan", NL_model_Morgan, ["model", "well","N_0","K", "shape", "N_max","th_max_gr","emp_max_gr","loss"] ), NL_Model( "NL_Weibull", NL_model_Weibull, ["model", "well","N_max","N_0", "growth_rate", "shape","th_max_gr","emp_max_gr","loss"] ),

ang-one commented 8 months ago

Note that the optimization now support piecewise models. If there is any request i will add

ang-one commented 8 months ago

Now working on the fit one file version of the function some considerations

the methods for evaluating confidence intervals must be studied and implemented
the bootstrap errors should be checked
I see very hard (i dont have idea how to do in general for any model) fixing the CI at bounds to perform segmentation with NL fit and fair comparison between the methods

ang-one commented 8 months ago

Working onto the NL fit for segmentation. Problem: For segment that are not the first 1 I need to maintain the continuity of the fitted solution. Not all models have a relationship of N(t=t_start) The i tried by forcing convergence two first points with a specific loss

function NL_L2_fixed_CI(data,model_function,u, p)

model = model_function(u, data[1, :])
penality_ci = 10^9
n_data = length(data[1, :])
residuals_ci = penality_ci .* (model[1:2] .- data[2, 1:2]) ./ (2)
residuals = (model[3:end] .- data[2, 3:end]) ./ (n_data -2)
residuals_tot = (sum(residuals_ci .^ 2)) + (sum(residuals .^ 2))
return residuals_tot

end

The problems of such approach is that we need to understand a good penality_ci

otherwise the effect will be

ang-one commented 8 months ago

I add here some example of the fits of implemented models using L2 loss

ang-one commented 8 months ago

I add here some example of the fits of implemented models using L2 loss with a penality of 10^3 to the first points of the fit

ang-one commented 8 months ago

Tuning a penality parameter loss total = penality * loss_ci + loss_rest_of_fit somehow one can obtain the desired results

ang-one commented 8 months ago

I added

MCMC search of the initialisation of starting guess of parameters
integration of penalty to respect the function boundary values --> loss = loss_bounds + loss_fit

ang-one commented 8 months ago

example of usage:

M= fit_NL_model_MCMC_intialization(data_OD_smooth2, # dataset first row times second row OD "test", # name of the well "", #label of the experiment "NL_logistic", # ode model to use nl_lb, # lower bound param nl_ub; # upper bound param u0=new_param,# initial guess param nrep =300, optmizator=BBO_adaptive_de_rand_1_bin_radiuslimited(), display_plots=true, # display plots in julia or not save_plot=false, path_to_plot="NA", # where save plots pt_avg=1, # numebr of the point to generate intial condition pt_smooth_derivative=7, smoothing=false, # the smoothing is done or not? type_of_smoothing="rolling_avg", type_of_loss="L2", # type of used loss blank_array=zeros(100), # data of all blanks multiple_scattering_correction=false, # if true uses the given calibration curve to fix the data method_multiple_scattering_correction="interpolation", calibration_OD_curve="NA", # the path to calibration curve to fix the data PopulationSize=50, maxiters=9000000, abstol=0.000000001, thr_lowess=0.05, penality_CI = 8 )

ang-one commented 8 months ago

First tests of a segmentation for non linear fit

nl_ub_1 = [2.0001 , 10.00000001, 500.00] nl_lb_1 = [0.0001 , 0.00000001, 0.00 ]

nl_ub_2 = [2.0001 , 10.00000001, 5.00,5.0] nl_lb_2 = [0.0001 , 0.00000001, 0.00,0.0 ]

nl_ub_3 = [2.0001 , 10.00000001] nl_lb_3 = [0.0001 , 0.00000001]

nl_ub_4 = [2.0001 , 10.00000001, 500.00] nl_lb_4 = [0.0001 , 0.00000001, 0.00 ]

list_models_f = ["NL_Gompertz","NL_Bertalanffy","NL_exponential","NL_Gompertz"] list_lb =[nl_lb_1,nl_lb_2,nl_lb_3,nl_lb_4] list_ub = [nl_ub_1,nl_ub_2,nl_ub_3,nl_ub_4]

R= selection_NL_maxiumum_change_points( data_OD_smooth2, # dataset first row times second row OD "", # name of the well "test", #label of the experiment list_models_f, # ode model to use list_lb, # lower bound param list_ub, # upper bound param 3; type_of_loss="L2_fixed_CI", # type of used loss optmizator=BBO_adaptive_de_rand_1_bin_radiuslimited(), # selection of optimization method method_of_fitting="MCMC", # selection of sciml integrator type_of_detection="sliding_win", type_of_curve="original", smoothing=false, type_of_smoothing="lowess", thr_lowess=0.05, pt_avg=1, nrep=50, save_plot=false, # do plots or no display_plots=true, path_to_plot="NA", # where save plots win_size=14, # numebr of the point to generate intial condition pt_smooth_derivative=0, multiple_scattering_correction=false, # if true uses the given calibration curve to fix the data method_multiple_scattering_correction="interpolation", calibration_OD_curve="NA", # the path to calibration curve to fix the data beta_smoothing_ms=2.0, # parameter of the AIC penality method_peaks_detection="peaks_prominence", n_bins=40, PopulationSize=300, maxiters=2000000, abstol=0.00000001, penality_CI=7.0)

ang-one commented 8 months ago

the fitting for NL is now backward and test all possible permutations of changepoints However performances are not so good

ang-one commented 8 months ago

Performance issue seems correlated to the new AICc calculation

ang-one commented 8 months ago

After AICc fixes seems to work!!!!

ang-one commented 8 months ago

For non linear fitting we miss

fit file version of model selection and segmentation
check on the errors given by bootstrap or new function for the error

ang-one commented 8 months ago

In this case the user enters the maximum number possible change points and we fit with all possible permutation of CPD between 1-nmax (e.g, if cpd =[1,2,3] i fit with [1],[2],[3],[1,2],[1,3],[1,2,3] and choose the best model for AICc) the problem is that we need to add the case 0 change points

ang-one commented 8 months ago

Note that now to choose a fixed number of cpd one can use a control and fit a fixed number of cpd as before

ang-one commented 8 months ago

I can implement the same also for ODE but will be a loooot slower

ang-one commented 8 months ago

Need a control of the tested configuration of segments , there are repetitions?

ang-one commented 8 months ago

Need a control of the tested configuration of segments , there are repetitions?

done, no

ang-one commented 8 months ago

In theory i have implemented the segmentation version for a file so the only stuff missing is the error calculation for NL fit with montecarlo (decide which method is better)

In practice, the perfomances of NL are not satisfactory. I have obtained good results with fit of a single function using MCMC search of parameters initialisation. For segmentation I have bad results. For the moment i am exploring three possibilities

I did not tried hard (try with high level of MCMC optimisation of hyper parameters)
AICc calculation (the results with AIC was better to me) AICc seems to give very strong advantage to very simple model as the exponential
The loss trick to fix boundary condition among segments for sure is deleterious for perfomances this constrain can be made more efficient
the new search by permutation of cp is too much

ang-one commented 8 months ago

I am testing segmentation on synthetic data. I generate a curve with a logistic and a gompertz. I add some noise and I started the optimization from the ground truth parameters

The parameters of ther first segment are fully retrived, But the one of the first NO Screenshot 2024-02-19 at 09 34 26

Screenshot 2024-02-19 at 09 32 00

ang-one commented 8 months ago

The stranger stuff is that for NL fit the segmentation goes backward and not forward in time

So great errors in the first part means propagating it. But this point should be indagated

ang-one commented 8 months ago

The code of previous example

model = "logistic" n_start =[0.1] tstart =0.0

tmax = 0100.0 delta_t=2.0 param_of_ode= [0.1,0.2 ] sim_1 = ODE_sim(model, #string of the model n_start, # starting condition tstart, # start time of the sim tmax, # final time of the sim delta_t, # delta t for poisson approx param_of_ode # parameters of the ODE model )

sol_1 =reduce(vcat,sim_1)

second segment ODE

model = "gompertz" n_start =[sol_1[end]] tstart =102.0 tmax = 0200.0 delta_t=2.0 param_of_ode= [0.2,0.5 ]

sim_2= ODE_sim(model, #string of the model n_start, # starting condition tstart, # start time of the sim tmax, # final time of the sim delta_t, # delta t for poisson approx param_of_ode # parameters of the ODE model )

sol_2 =reduce(vcat,sim_2)

third segment ODE

times_sim =vcat(sim_1.t,sim_2.t)

binding the simulatios

sol_sim =vcat(sol_1,sol_2)

Plots.scatter(sol_sim, xlabel="Time", ylabel="Arb. Units", label=["Data " nothing], color=:blue, size = (300,300))

data_OD = Matrix(transpose(hcat(times_sim,sol_sim)))

adding noise_unifo

noise_unifom = rand(Uniform(-0.001,0.001),length(sol_sim))

data_OD[2,:] = data_OD[2,:] .+ noise_unifom Plots.scatter(data_OD[1,:],data_OD[2,:], xlabel="Time", ylabel="Arb. Units", label=["Data " nothing],color=:blue,markersize =2 ,size = (300,300))

testing change point function

L = selection_NL_fixed_interval( data_OD, # dataset first row times second row OD "", # name of the well "", #label of the experiment ["NL_logistic","NL_Gompertz"], # ode models to use [[0.0,0.0,-1.0],[0.0,0.0,0.0]], # upper bound param [[1.0,1.0,1.0],[1.0,1.0,1.0]], [100]; list_u0=[[0.2,0.1,0.0],[0.5,0.2,0.0]],# initial guess param type_of_loss="L2", # type of used loss optmizator=BBO_adaptive_de_rand_1_bin_radiuslimited(), # selection of optimization method method_of_fitting="MCMC", # selection of sciml integrator smoothing=false, nrep =200, type_of_smoothing="lowess", thr_lowess=0.05, pt_avg=1, pt_smooth_derivative=0, multiple_scattering_correction=false, # if true uses the given calibration curve to fix the data method_multiple_scattering_correction="interpolation", calibration_OD_curve="NA", # the path to calibration curve to fix the data beta_smoothing_ms=0.0, # parameter of the AIC penality PopulationSize=500, maxiters=20000000, abstol=0.000000001, penality_CI=80000000.0)

Plots.scatter(data_OD[1,:],data_OD[2,:], xlabel="Time", ylabel="Arb. Units", label=["Data " nothing],color=:blue,markersize =2 ,size = (300,300))

plot!(data_OD[1,:],L[2])

ang-one commented 8 months ago

I can replicate the problem also with other models so it is not intrinc of Gompertz Screenshot 2024-02-19 at 10 18 05

ang-one commented 8 months ago

The problem seems in the first fitting

Screenshot 2024-02-19 at 10 21 48

ang-one commented 8 months ago

I have also tested with more than two segments. I confirm the problem seems to resides in the first interval

ang-one commented 8 months ago

I have printed the solution during the run of the function and it seems that what is plotted for the first segment it is not what it is inferred, i am investigating on this

ang-one commented 8 months ago

I think is solved! two problems found

1) the plotting 2) the upper bound i was using made the fitting impossible

ang-one commented 8 months ago

So Given

synth data
the change points
perfect intiliaziation

It works! Now i proceed by studying how initizialization alters results. I test the sensitivity against initialization

ang-one commented 8 months ago

I have studied the NL sensitivity of a single logistic fit, the procedure done automatically with jmaki functions follows (https://en.wikipedia.org/wiki/Morris_method).

ang-one commented 8 months ago

The results shows that most of the configs returns the perfect value of the parameters but there are some that some times you get a Biiiiig error.

This beahvior technically it is not a problem because the hyper parameter optmization of the start should cure this. I will add an example of the same test but using a MCMC to choose the initilization

ang-one commented 8 months ago

I have tested the sensitivity of the NL fit using only 10 step of MCMC optimization of the initialization of the parameters. ON synth data this seem to solve most of the problems due to initialization

ang-one commented 8 months ago

Summarising

the non linear fit has more degrees of freedom of ODE this leads to major problems due to initialization of the fitting (optimisation) problem
The MCMC optimization of the inital optimization problems seems to solve this problem
On synth data the performance of simple segmentations seems very good

I am going to test

real data beahvior of segmentation after the today tuning
the cdp permutation detection on synth data

ang-one commented 8 months ago

Some examples, the behavior is improved by tuning a little the procedure, some hyperparam should be optimized a bit (e.g., how permute cp, AICc, wheight of the bounduary in the loss)