Closed ang-one closed 2 months ago
Decision tree is fully implemented. All of the function and option of the package are integrated in jmaki Please check in the read me the package necessary to run the new version
To test the function use the following example
jm_res_test = readdlm("E:/Lavoro/chem_isolates_wo_blank_sub/res_to_test_ML_ode.csv", ',') annotation_test = readdlm("E:/Lavoro/chem_isolates_wo_blank_sub/annotation_to_test_ML_ode.csv", ',')
index_mixture = findall(annotation_test[:,end].== "mixture") feature_matrix = annotation_test[index_mixture,3:(end-2)] jmaki_results = jm_res_test[:,index_mixture]
a = downstream_decision_tree_regression(jmaki_results, feature_matrix, 7; verbose = true, do_cross_validation = true, max_depth = 3, n_folds_cv = 5, )
a = downstream_decision_tree_regression(jmaki_results, feature_matrix, 9; verbose = true, do_cross_validation = true, max_depth = 3, n_folds_cv = 5, )
annotation_to_test_ML_ode.csv res_to_test_ML_ode.csv these are the two files of the full results of the big dataset used in previous example
added all options of symbolic regression. The defaults are defined by the following data struct
options = SymbolicRegression.Options(; binary_operators=[+, -, /, *], unary_operators=[], constraints=nothing, elementwise_loss=nothing, loss_function=nothing, tournament_selection_n=12, #1 sampled from every tournament_selection_n per mutation tournament_selection_p=0.86, topn=12, #samples to return per population complexity_of_operators=nothing, complexity_of_constants=nothing, complexity_of_variables=nothing, parsimony=0.0032, dimensional_constraint_penalty=nothing, alpha=0.100000, maxsize=20, maxdepth=nothing, turbo=false, bumper=false, migration=true, hof_migration=true, should_simplify=nothing, should_optimize_constants=true, output_file=nothing, node_type=SymbolicRegression.Node, populations=15, perturbation_factor=0.076, annealing=false, batching=false, batch_size=50, mutation_weights=MutationWeights(), crossover_probability=0.066, warmup_maxsize_by=0.0, use_frequency=true, use_frequency_in_tournament=true, adaptive_parsimony_scaling=20.0, population_size=33, ncycles_per_iteration=550, fraction_replaced=0.00036, fraction_replaced_hof=0.035, verbosity=nothing, print_precision=3, save_to_file=true, probability_negate_constant=0.01, seed=nothing, bin_constraints=nothing, una_constraints=nothing, progress=nothing, terminal_width=nothing, optimizer_algorithm=Optim.BFGS(), optimizer_nrestarts=2, optimizer_probability=0.14, optimizer_iterations=nothing, optimizer_f_calls_limit=nothing, optimizer_options=nothing, use_recorder=false, recorder_file="pysr_recorder.json", early_stop_condition=nothing, timeout_in_seconds=nothing, max_evals=nothing, skip_mutation_failures=true, nested_constraints=nothing, deterministic=false,
define_helper_functions=true,
deprecated_return_state=nothing,
# Deprecated args:
fast_cycle=false,
npopulations=nothing,
npop=nothing,
)+
Example of working code
jm_res_test = readdlm("E:/Lavoro/Monod_AA_res/ODE/exp_4/ODE_exp_4_parameters_aHPM.csv", ',') annotation_test = CSV.File("E:/Lavoro/Monod_AA_res/Monod_AA_detection/exp_4/annotation.csv") names_of_annotation = propertynames(annotation_test) feature_matrix = hcat(annotation_test[:V1], annotation_test[:V3]) jmaki_results = jm_res_test
options = SymbolicRegression.Options(; binary_operators=[+, /, *], unary_operators=[], constraints=nothing, elementwise_loss=nothing, loss_function=nothing, tournament_selection_n=12, #1 sampled from every tournament_selection_n per mutation tournament_selection_p=0.86, topn=12, #samples to return per population complexity_of_operators=nothing, complexity_of_constants=nothing, complexity_of_variables=nothing, parsimony=0.0032, dimensional_constraint_penalty=nothing, alpha=0.100000, maxsize=20, maxdepth=nothing, turbo=false, bumper=false, migration=true, hof_migration=true, should_simplify=true, should_optimize_constants=true, output_file=nothing, node_type=SymbolicRegression.Node, populations=50, perturbation_factor=0.076, annealing=true, batching=false, batch_size=50, mutation_weights=MutationWeights(), crossover_probability=0.066, warmup_maxsize_by=0.0, use_frequency=true, use_frequency_in_tournament=true, adaptive_parsimony_scaling=20.0, population_size=100, ncycles_per_iteration=550, fraction_replaced=0.00036, fraction_replaced_hof=0.035, verbosity=nothing, print_precision=3, save_to_file=true, probability_negate_constant=0.01, seed=3, bin_constraints=nothing, una_constraints=nothing, progress=nothing, terminal_width=nothing, optimizer_algorithm=Optim.BFGS(), optimizer_nrestarts=2, optimizer_probability=0.14, optimizer_iterations=nothing, optimizer_f_calls_limit=nothing, optimizer_options=nothing, use_recorder=false, recorder_file="pysr_recorder.json", early_stop_condition=nothing, timeout_in_seconds=nothing, max_evals=nothing, skip_mutation_failures=true, nested_constraints=nothing, deterministic=false,
define_helper_functions=true,
deprecated_return_state=nothing,
# Deprecated args:
fast_cycle=false,
npopulations=nothing,
npop=nothing,
)
gr_sy_reg = downstream_symbolic_regression(jmaki_results, feature_matrix, 9; options = SymbolicRegression.Options(), )
files to test ODE_exp_4_parameters_aHPM.csv annotation.csv
I start working in utilities repo to create the stable analysis of these files
Please for first full example (it is ready for publication look at fit_AA_experiment" in the jmaki_utilities folder
Also the decision tree has script ready in the other repo check it
Today I finish to work on the file ML_downstream.jl . It Is using the output of the script that fit one files