create data structure for storing feature names, units, and latex "pretty print" version

mi3nts / MintsML.jl

https://mi3nts.github.io/MintsML.jl/

MIT License

0 stars 0 forks source link

create data structure for storing feature names, units, and latex "pretty print" version #68

Open john-waczak opened 11 months ago

john-waczak commented 11 months ago

We should create a datatable struct to hold the Tables compatible data splits (i.e. all CV folds or Train/Test + uncertainty holdout for conformal prediction) as well as list of units, uncertainties, and "pretty printing" version of the features using latex strings. We can then pass this around to our pipeline for all fitting/plotting tasks

john-waczak commented 11 months ago

Add option to include feature uncertainties in the dataset.

john-waczak commented 11 months ago

track data type and scientific type for each feature

john-waczak commented 11 months ago

Rather than actually holding the data table, this can just be a common format for storing the relevant metadata, e.g. column names, units, pretty print version, scientific types, features/targets, etc... We should also use JSON.jl and define a serialization scheme so that the metadata can be saved and reloaded for each model and version controlled. See https://github.com/john-waczak/AutoChem.jl/blob/main/src/bimolecular-reactions.jl for JSON.jl inspiration.

john-waczak commented 11 months ago

add metadata for each feature specifying whether or not the feature is control-able

When we go to do our feature importance rankings, we can then color code the features by whether or not the feature is controlable. Then we can decide not only which features are the most important to building a good model, but also which of those important features have knobs we can influence.

"a 2 times reduction in feature X leads to a change of 3 times in target Y" or something to that extent...