mi3nts / MintsML.jl

https://mi3nts.github.io/MintsML.jl/
MIT License
0 stars 0 forks source link

add pipeline step for exploratory data analysis #77

Open john-waczak opened 11 months ago

john-waczak commented 11 months ago

Include

Generate a report for a Dataset Overview

john-waczak commented 11 months ago

Add identifiers for missing, nan, inf, other types so that we can drop them later

john-waczak commented 11 months ago

Take as an argument a Table.jl or DTable.jl compatible table for this step. Also consider using OnlineStats.jl for processing large datasets over predefined chunks.

john-waczak commented 11 months ago

It will be desirable to specify lists of features and lists of targets to perform the exploratory analysis on. Then after reviewing the results, we can establish heuristics for which targets we want to train models for with which subset of relevant features.

john-waczak commented 11 months ago

For the feature importance evaluation, we should try to pick a model that will also allow us to estimate model sensitivity to features, e.g. MLPs, regression, etc...

john-waczak commented 11 months ago

sensitivity analysis in ML context