Open john-waczak opened 11 months ago
Add option to include feature uncertainties in the dataset.
track data type and scientific type for each feature
Rather than actually holding the data table, this can just be a common format for storing the relevant metadata, e.g. column names, units, pretty print version, scientific types, features/targets, etc... We should also use JSON.jl
and define a serialization scheme so that the metadata can be saved and reloaded for each model and version controlled. See https://github.com/john-waczak/AutoChem.jl/blob/main/src/bimolecular-reactions.jl for JSON.jl
inspiration.
add metadata for each feature specifying whether or not the feature is control-able
When we go to do our feature importance rankings, we can then color code the features by whether or not the feature is controlable. Then we can decide not only which features are the most important to building a good model, but also which of those important features have knobs we can influence.
"a 2 times reduction in feature X leads to a change of 3 times in target Y" or something to that extent...
We should create a datatable struct to hold the Tables compatible data splits (i.e. all CV folds or Train/Test + uncertainty holdout for conformal prediction) as well as list of units, uncertainties, and "pretty printing" version of the features using latex strings. We can then pass this around to our pipeline for all fitting/plotting tasks