Closed hongxuzhou closed 1 month ago
This pull request implements a Support Vector Machine (SVM) classifier for text classification. The implementation includes data preprocessing, model training with LinearSVC, and evaluation components using scikit-learn's Pipeline functionality.
classDiagram
class SVMClassifier {
- dataPreprocessing()
- trainModel()
- evaluateModel()
}
class LinearSVC {
+ fit(X, y)
+ predict(X)
}
class Pipeline {
+ fit(X, y)
+ predict(X)
}
SVMClassifier --> LinearSVC : uses
SVMClassifier --> Pipeline : uses
Change | Details | Files |
---|---|---|
Implemented text preprocessing pipeline with multiple cleaning steps |
|
Classic ML/classic_SVM.ipynb |
Built SVM classification pipeline with feature engineering |
|
Classic ML/classic_SVM.ipynb |
Added model evaluation and results visualization |
|
Classic ML/classic_SVM.ipynb |
Best results
Results for Train set:
Accuracy: 0.8994
Classification Report: precision recall f1-score support
macro avg 0.88 0.90 0.89 12239 weighted avg 0.90 0.90 0.90 12239
Results for Dev set:
Accuracy: 0.7337
Classification Report: precision recall f1-score support
macro avg 0.71 0.71 0.71 999 weighted avg 0.73 0.73 0.73 999
tracker index
015
Summary by Sourcery
Implement a Support Vector Machine (SVM) model for text classification with a comprehensive preprocessing pipeline and evaluation functions. The model is trained and evaluated on provided datasets, demonstrating its performance with accuracy and classification reports.
New Features:
Enhancements:
Tests: