Open john-waczak opened 11 months ago
Add identifiers for missing
, nan
, inf
, other types so that we can drop them later
Take as an argument a Table.jl or DTable.jl compatible table for this step. Also consider using OnlineStats.jl for processing large datasets over predefined chunks.
It will be desirable to specify lists of features and lists of targets to perform the exploratory analysis on. Then after reviewing the results, we can establish heuristics for which targets we want to train models for with which subset of relevant features.
For the feature importance evaluation, we should try to pick a model that will also allow us to estimate model sensitivity to features, e.g. MLPs, regression, etc...
Include
Generate a report for a Dataset Overview