shaivimalik / medicine_preprocessing-on-entire-dataset

Reproducing "Characterization of Term and Preterm Deliveries using Electrohysterograms Signatures"
MIT License
0 stars 0 forks source link

SVC hyperparameters - 03.ipynb / Synthetic dataset toy example notebook #6

Open shaivimalik opened 3 months ago

shaivimalik commented 3 months ago

The hyperparameter values for SVC were obtained by performing GridSearchCV on the training set.

The code snippets given below can be used to validate the findings:

For Training SVM - with Data Leakage:

from sklearn.model_selection import GridSearchCV
# Define parameters for grid search
gamma_range = np.logspace(start=-4, stop=3, num=8, base=2)
C_range = np.logspace(start=-4, stop=3, num=8, base=10)
parameters = {'C': C_range, 'gamma': gamma_range}

# Initialize SVM model
svc = SVC(kernel='rbf', random_state=15)

# Define GridSearchCV with custom scorers
clf = GridSearchCV(svc, parameters, cv=10, scoring='accuracy')

# Perform grid search
clf.fit(X_train, y_train)

# Print results
print("Accuracy:", clf.best_score_)
print("Best hyperparameters:", clf.best_params_)

For Training SVM - without Data Leakage:

from sklearn.model_selection import GridSearchCV
# Define parameters for grid search
gamma_range = np.logspace(start=-4, stop=3, num=8, base=2)
C_range = np.logspace(start=-4, stop=3, num=8, base=10)
parameters = {'C': C_range, 'gamma': gamma_range}

# Initialize SVM model
svc = SVC(kernel='rbf', random_state=15)

# Define GridSearchCV with custom scorers
clf = GridSearchCV(svc, parameters, cv=10, scoring='accuracy')

# Perform grid search
clf.fit(X_train_oversamp,y_train_oversamp)

# Print results
print("Accuracy:", clf.best_score_)
print("Best hyperparameters:", clf.best_params_)
shaivimalik commented 2 months ago

In the approach presented above for Training SVM - without Data Leakage , there is data leakage from training set to validation set within GridSearchCV.