sagnik1511 / Tabular-AutoML

Python Auto-ML Package for Tabular Datasets
MIT License
23 stars 13 forks source link

Add a new class "Scaling" under processing. #8

Open sagnik1511 opened 2 years ago

sagnik1511 commented 2 years ago
  1. Prepare a new class under the processing module.
  2. Prepare the functions with a proper idea and also add appropriate comments.
  3. Add a function "run" inside the "Scaling" which will go through every feature, e.g. link.
  4. Add the function under the class Preprocessing.

Follow contributing guidelines on README.md

Tihsrah commented 2 years ago

I would like to suggest that we can scale the data directly by using Sklearn.preprocessing

scaler=MinMaxScaler() x_train=scaler.fit_transform(x_train) x_val=scaler.transform(x_val)

by adding these lines to the training.py file we can easily scale the data without needing to parse through each feature through a for loop which would be more time consuming and also can be a reason for many bugs. we can also put conditions if the model is for regression and also ask user which scailing function they want and apply those to the x_train and x_val

If you agree to this idea then please assign this Issue to me.

sagnik1511 commented 2 years ago

@Tihsrah , it is not better to use minmaxscaling for continuous data , so in some cases it it better to just scale down or scale up. So basically the function should be flexible for every single data column.

If you have a flexible idea about it, please drop the idea in the comments, if it shows clarity, I'll assign you.

Tihsrah commented 2 years ago

What if we first do a "Robust scalar" over the data and then use "Standard Scaler"

sagnik1511 commented 2 years ago

@Tihsrah, I would suggest you be flexible while making the class as I stated early.