hongxuzhou / LfD_final_Assignment

0 stars 0 forks source link

SVM Implementation. #14

Closed hongxuzhou closed 1 month ago

hongxuzhou commented 1 month ago

Best results

Results for Train set:

Accuracy: 0.8994

Classification Report: precision recall f1-score support

       0       0.94      0.91      0.92      8192
       1       0.82      0.88      0.85      4047

accuracy                           0.90     12239

macro avg 0.88 0.90 0.89 12239 weighted avg 0.90 0.90 0.90 12239

Results for Dev set:

Accuracy: 0.7337

Classification Report: precision recall f1-score support

       0       0.79      0.80      0.80       647
       1       0.62      0.62      0.62       352

accuracy                           0.73       999

macro avg 0.71 0.71 0.71 999 weighted avg 0.73 0.73 0.73 999

tracker index

015

Summary by Sourcery

Implement a Support Vector Machine (SVM) model for text classification with a comprehensive preprocessing pipeline and evaluation functions. The model is trained and evaluated on provided datasets, demonstrating its performance with accuracy and classification reports.

New Features:

Enhancements:

Tests:

sourcery-ai[bot] commented 1 month ago

Reviewer's Guide by Sourcery

This pull request implements a Support Vector Machine (SVM) classifier for text classification. The implementation includes data preprocessing, model training with LinearSVC, and evaluation components using scikit-learn's Pipeline functionality.

Class diagram for SVM Implementation

classDiagram
    class SVMClassifier {
        - dataPreprocessing()
        - trainModel()
        - evaluateModel()
    }
    class LinearSVC {
        + fit(X, y)
        + predict(X)
    }
    class Pipeline {
        + fit(X, y)
        + predict(X)
    }
    SVMClassifier --> LinearSVC : uses
    SVMClassifier --> Pipeline : uses

File-Level Changes

Change Details Files
Implemented text preprocessing pipeline with multiple cleaning steps
  • Created TextPreprocessor class with methods for handling URLs, user mentions, emojis and text standardization
  • Added preprocessing pipeline configuration with customizable options
  • Implemented text analysis functionality to validate preprocessing results
Classic ML/classic_SVM.ipynb
Built SVM classification pipeline with feature engineering
  • Created pipeline combining CountVectorizer, TfidfTransformer and LinearSVC
  • Configured feature extraction with n-grams and max features parameters
  • Added class weight balancing and hyperparameter tuning for the SVM classifier
Classic ML/classic_SVM.ipynb
Added model evaluation and results visualization
  • Implemented comprehensive evaluation metrics including accuracy, precision, recall and F1-score
  • Added confusion matrix visualization with seaborn heatmap
  • Created separate evaluation functions for train and dev sets
Classic ML/classic_SVM.ipynb

Tips and commands #### Interacting with Sourcery - **Trigger a new review:** Comment `@sourcery-ai review` on the pull request. - **Continue discussions:** Reply directly to Sourcery's review comments. - **Generate a GitHub issue from a review comment:** Ask Sourcery to create an issue from a review comment by replying to it. - **Generate a pull request title:** Write `@sourcery-ai` anywhere in the pull request title to generate a title at any time. - **Generate a pull request summary:** Write `@sourcery-ai summary` anywhere in the pull request body to generate a PR summary at any time. You can also use this command to specify where the summary should be inserted. #### Customizing Your Experience Access your [dashboard](https://app.sourcery.ai) to: - Enable or disable review features such as the Sourcery-generated pull request summary, the reviewer's guide, and others. - Change the review language. - Add, remove or edit custom review instructions. - Adjust other review settings. #### Getting Help - [Contact our support team](mailto:support@sourcery.ai) for questions or feedback. - Visit our [documentation](https://docs.sourcery.ai) for detailed guides and information. - Keep in touch with the Sourcery team by following us on [X/Twitter](https://x.com/SourceryAI), [LinkedIn](https://www.linkedin.com/company/sourcery-ai/) or [GitHub](https://github.com/sourcery-ai).