ang-one commented 6 months ago

Today I finish to work on the file ML_downstream.jl . It Is using the output of the script that fit one files

ang-one commented 6 months ago

Decision tree is fully implemented. All of the function and option of the package are integrated in jmaki Please check in the read me the package necessary to run the new version

ang-one commented 6 months ago

To test the function use the following example

jm_res_test = readdlm("E:/Lavoro/chem_isolates_wo_blank_sub/res_to_test_ML_ode.csv", ',') annotation_test = readdlm("E:/Lavoro/chem_isolates_wo_blank_sub/annotation_to_test_ML_ode.csv", ',')

index_mixture = findall(annotation_test[:,end].== "mixture") feature_matrix = annotation_test[index_mixture,3:(end-2)] jmaki_results = jm_res_test[:,index_mixture]

dt regression for nmax

a = downstream_decision_tree_regression(jmaki_results, feature_matrix, 7; verbose = true, do_cross_validation = true, max_depth = 3, n_folds_cv = 5, )

dt regression for gr

a = downstream_decision_tree_regression(jmaki_results, feature_matrix, 9; verbose = true, do_cross_validation = true, max_depth = 3, n_folds_cv = 5, )

ang-one commented 6 months ago

annotation_to_test_ML_ode.csv res_to_test_ML_ode.csv these are the two files of the full results of the big dataset used in previous example

ang-one commented 6 months ago

added all options of symbolic regression. The defaults are defined by the following data struct

options = SymbolicRegression.Options(; binary_operators=[+, -, /, *], unary_operators=[], constraints=nothing, elementwise_loss=nothing, loss_function=nothing, tournament_selection_n=12, #1 sampled from every tournament_selection_n per mutation tournament_selection_p=0.86, topn=12, #samples to return per population complexity_of_operators=nothing, complexity_of_constants=nothing, complexity_of_variables=nothing, parsimony=0.0032, dimensional_constraint_penalty=nothing, alpha=0.100000, maxsize=20, maxdepth=nothing, turbo=false, bumper=false, migration=true, hof_migration=true, should_simplify=nothing, should_optimize_constants=true, output_file=nothing, node_type=SymbolicRegression.Node, populations=15, perturbation_factor=0.076, annealing=false, batching=false, batch_size=50, mutation_weights=MutationWeights(), crossover_probability=0.066, warmup_maxsize_by=0.0, use_frequency=true, use_frequency_in_tournament=true, adaptive_parsimony_scaling=20.0, population_size=33, ncycles_per_iteration=550, fraction_replaced=0.00036, fraction_replaced_hof=0.035, verbosity=nothing, print_precision=3, save_to_file=true, probability_negate_constant=0.01, seed=nothing, bin_constraints=nothing, una_constraints=nothing, progress=nothing, terminal_width=nothing, optimizer_algorithm=Optim.BFGS(), optimizer_nrestarts=2, optimizer_probability=0.14, optimizer_iterations=nothing, optimizer_f_calls_limit=nothing, optimizer_options=nothing, use_recorder=false, recorder_file="pysr_recorder.json", early_stop_condition=nothing, timeout_in_seconds=nothing, max_evals=nothing, skip_mutation_failures=true, nested_constraints=nothing, deterministic=false,

Not search options; just construction options:

define_helper_functions=true,
deprecated_return_state=nothing,
# Deprecated args:
fast_cycle=false,
npopulations=nothing,
npop=nothing,

)+

ang-one commented 6 months ago

Example of working code

jm_res_test = readdlm("E:/Lavoro/Monod_AA_res/ODE/exp_4/ODE_exp_4_parameters_aHPM.csv", ',') annotation_test = CSV.File("E:/Lavoro/Monod_AA_res/Monod_AA_detection/exp_4/annotation.csv") names_of_annotation = propertynames(annotation_test) feature_matrix = hcat(annotation_test[:V1], annotation_test[:V3]) jmaki_results = jm_res_test

options = SymbolicRegression.Options(; binary_operators=[+, /, *], unary_operators=[], constraints=nothing, elementwise_loss=nothing, loss_function=nothing, tournament_selection_n=12, #1 sampled from every tournament_selection_n per mutation tournament_selection_p=0.86, topn=12, #samples to return per population complexity_of_operators=nothing, complexity_of_constants=nothing, complexity_of_variables=nothing, parsimony=0.0032, dimensional_constraint_penalty=nothing, alpha=0.100000, maxsize=20, maxdepth=nothing, turbo=false, bumper=false, migration=true, hof_migration=true, should_simplify=true, should_optimize_constants=true, output_file=nothing, node_type=SymbolicRegression.Node, populations=50, perturbation_factor=0.076, annealing=true, batching=false, batch_size=50, mutation_weights=MutationWeights(), crossover_probability=0.066, warmup_maxsize_by=0.0, use_frequency=true, use_frequency_in_tournament=true, adaptive_parsimony_scaling=20.0, population_size=100, ncycles_per_iteration=550, fraction_replaced=0.00036, fraction_replaced_hof=0.035, verbosity=nothing, print_precision=3, save_to_file=true, probability_negate_constant=0.01, seed=3, bin_constraints=nothing, una_constraints=nothing, progress=nothing, terminal_width=nothing, optimizer_algorithm=Optim.BFGS(), optimizer_nrestarts=2, optimizer_probability=0.14, optimizer_iterations=nothing, optimizer_f_calls_limit=nothing, optimizer_options=nothing, use_recorder=false, recorder_file="pysr_recorder.json", early_stop_condition=nothing, timeout_in_seconds=nothing, max_evals=nothing, skip_mutation_failures=true, nested_constraints=nothing, deterministic=false,

Not search options; just construction options:

define_helper_functions=true,
deprecated_return_state=nothing,
# Deprecated args:
fast_cycle=false,
npopulations=nothing,
npop=nothing,

)

gr_sy_reg = downstream_symbolic_regression(jmaki_results, feature_matrix, 9; options = SymbolicRegression.Options(), )

ang-one commented 6 months ago

files to test ODE_exp_4_parameters_aHPM.csv annotation.csv

ang-one commented 6 months ago

I start working in utilities repo to create the stable analysis of these files

ang-one commented 6 months ago

Please for first full example (it is ready for publication look at fit_AA_experiment" in the jmaki_utilities folder

ang-one commented 6 months ago

Also the decision tree has script ready in the other repo check it

pinheiroGroup / Kinbiont.jl

ML integration #40

dt regression for nmax

dt regression for gr

Not search options; just construction options:

Not search options; just construction options: