Open a-arad opened 5 months ago
this ticket cannot be assigned until issue #1 and #2 are complete
our preprocessing pipeline currently is:
we cannot be sure this is the best way to preprocess the data.
so set up an experiment using the following guidelines:
e.g.
missing data -> [impute, drop] transformation -> [none, x -> log(x), x -> box-cox(x), x -> sqrt(x), .... ] scaling -> [none, robust, standard (z-score), minmax] outliers -> [none, winsorization, drop, something else]
then for each combination
[impute, x -> log(x), minmax, drop]
then determine which combination of preprocessing steps results in the lowest inertia
Hi, I think I can do this one after issue #1 is completed
yes - i followed up on issue 1 and will let you know when this issue becomes clear to work on
this ticket cannot be assigned until issue #1 and #2 are complete
our preprocessing pipeline currently is:
we cannot be sure this is the best way to preprocess the data.
so set up an experiment using the following guidelines:
e.g.
missing data -> [impute, drop] transformation -> [none, x -> log(x), x -> box-cox(x), x -> sqrt(x), .... ] scaling -> [none, robust, standard (z-score), minmax] outliers -> [none, winsorization, drop, something else]
then for each combination
e.g.
[impute, x -> log(x), minmax, drop]
then determine which combination of preprocessing steps results in the lowest inertia