Closed ezio-melotti closed 2 years ago
After discussing with Grant, I made the following changes:
outcome
is SUCCESS
, the object will include two additional top-level keys:
fittest_tree
, with id
and expression
score
, with the fitness
score and other kernel-dependent valuesconfig
keyHere are some example outputs:
This PR creates a JSON report that includes both the initial configuration and results as discussed in #62.
Implementation-wise, I copied the
fx_data_params_write
, created afx_eval_fittest
function that contains some duplicated logic, and created a newfx_data_params_write_json
that creates the report.Here are some sample JSONs for the classification and regression kernels:
Classification kernel
```json { "package": "Karoo GP", "launched": "2022-05-31_08-05-57-998149", "dataset": "karoo_gp/files/data_CLASSIFY.csv", "kernel": "c", "precision": 6, "tree_type": "g", "tree_depth_base": 3, "tree_depth_max": 4, "min_node_count": 3, "genetic_operators": { "reproduction": 0.1, "point_mutation": 0.1, "branch_mutation": 0.2, "crossover": 0.6 }, "tournament_size": 7, "population": 10, "number_of_generations": 10, "outcome": "SUCCESS", "fittest_tree_id": 10, "expression": "pw*sl*sw", "fitness_score": 6.0, "classification_report": { "0.0": { "precision": 0.0, "recall": 0.0, "f1-score": 0.0, "support": 11 }, "1.0": { "precision": 0.0, "recall": 0.0, "f1-score": 0.0, "support": 13 }, "2.0": { "precision": 0.2, "recall": 1.0, "f1-score": 0.33333333333333337, "support": 6 }, "accuracy": 0.2, "macro avg": { "precision": 0.06666666666666667, "recall": 0.3333333333333333, "f1-score": 0.11111111111111112, "support": 30 }, "weighted avg": { "precision": 0.04000000000000001, "recall": 0.2, "f1-score": 0.06666666666666667, "support": 30 } }, "confusion_matrix": [ [ 0, 0, 11 ], [ 0, 0, 13 ], [ 0, 0, 6 ] ] } ```Regression kernel
```json { "package": "Karoo GP", "launched": "2022-05-31_08-09-14-382026", "dataset": "karoo_gp/files/data_REGRESS.csv", "kernel": "r", "precision": 6, "tree_type": "g", "tree_depth_base": 3, "tree_depth_max": 4, "min_node_count": 3, "genetic_operators": { "reproduction": 0.1, "point_mutation": 0.1, "branch_mutation": 0.2, "crossover": 0.6 }, "tournament_size": 7, "population": 10, "number_of_generations": 10, "outcome": "SUCCESS", "fittest_tree_id": 9, "expression": "1", "fitness_score": 0.04999995231628418, "mean_squared_error": 7.777762948535383e-05 } ```We could/should bikeshed a bit on names and format, in particular:
results.json
instead oflog_test.json
since it includes the values fromlog_config.txt
too;outcome
value is eitherSUCCESS
orFAILURE
, and could be expanded to includeERROR
too in the future;results
value that is set to an empty dict in case of failure or a regular dict in case of success, so that users can check that directly without having to look to theOUTCOME
to determine if they can access the other values;genetic_operators
in a nested dict, but they could be kept as individual values in the main dict too;tree_type
andtree_depth_base
that were commented out in the original report;fitness_score
in a single value instead of 3 different values (one for each kernel);