pyronear / pyro-risks

Data science for wildfire risk forecasting and monitoring
https://pyronear.github.io/pyro-risks
Apache License 2.0
25 stars 8 forks source link

feat: Add SMOTE preprocessing step to oversample the dataset #60

Closed GHCamille closed 7 months ago

GHCamille commented 3 years ago

✨ This PR introduces training dataset resampling, and hyperparameters tuning to try to improve performance.

🎲 Resampling :

The SMOTE upsampling pipeline is defined with an imblearn pipeline.

It has to be implemented in the workflow, as follows :

X, y = load_dataset()
X["is_original_data"] = 1  
X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=cfg.TEST_SIZE, random_state=cfg.RANDOM_STATE
)
upsampling_pipeline.fit(X_train, y_train)
rf_pipeline.fit(X_train, y_train)

🏎️ Hyperparameters tuning :