SVM Implementation. - Githubissues

Best results

Results for Train set:

Accuracy: 0.8994

Classification Report: precision recall f1-score support

       0       0.94      0.91      0.92      8192
       1       0.82      0.88      0.85      4047

accuracy                           0.90     12239

macro avg 0.88 0.90 0.89 12239 weighted avg 0.90 0.90 0.90 12239

Results for Dev set:

Accuracy: 0.7337

Classification Report: precision recall f1-score support

       0       0.79      0.80      0.80       647
       1       0.62      0.62      0.62       352

accuracy                           0.73       999

macro avg 0.71 0.71 0.71 999 weighted avg 0.73 0.73 0.73 999

tracker index

015

Summary by Sourcery

Implement a Support Vector Machine (SVM) model for text classification with a comprehensive preprocessing pipeline and evaluation functions. The model is trained and evaluated on provided datasets, demonstrating its performance with accuracy and classification reports.

New Features:

Implement a Support Vector Machine (SVM) model for text classification using a pipeline that includes text preprocessing, feature extraction, and a LinearSVC classifier.

Enhancements:

Enhance text preprocessing by adding a TextPreprocessor class to handle URL, user mentions, emoji conversion, and text standardization.

Tests:

Add evaluation functions to assess the SVM model's performance on training and development datasets, including accuracy and classification reports.

Reviewer's Guide by Sourcery

This pull request implements a Support Vector Machine (SVM) classifier for text classification. The implementation includes data preprocessing, model training with LinearSVC, and evaluation components using scikit-learn's Pipeline functionality.

Class diagram for SVM Implementation

classDiagram
    class SVMClassifier {
        - dataPreprocessing()
        - trainModel()
        - evaluateModel()
    }
    class LinearSVC {
        + fit(X, y)
        + predict(X)
    }
    class Pipeline {
        + fit(X, y)
        + predict(X)
    }
    SVMClassifier --> LinearSVC : uses
    SVMClassifier --> Pipeline : uses

File-Level Changes

Change	Details	Files
Implemented text preprocessing pipeline with multiple cleaning steps	Created TextPreprocessor class with methods for handling URLs, user mentions, emojis and text standardization Added preprocessing pipeline configuration with customizable options Implemented text analysis functionality to validate preprocessing results	`Classic ML/classic_SVM.ipynb`
Built SVM classification pipeline with feature engineering	Created pipeline combining CountVectorizer, TfidfTransformer and LinearSVC Configured feature extraction with n-grams and max features parameters Added class weight balancing and hyperparameter tuning for the SVM classifier	`Classic ML/classic_SVM.ipynb`
Added model evaluation and results visualization	Implemented comprehensive evaluation metrics including accuracy, precision, recall and F1-score Added confusion matrix visualization with seaborn heatmap Created separate evaluation functions for train and dev sets	`Classic ML/classic_SVM.ipynb`

Tips and commands

#### Interacting with Sourcery - **Trigger a new review:** Comment `@sourcery-ai review` on the pull request. - **Continue discussions:** Reply directly to Sourcery's review comments. - **Generate a GitHub issue from a review comment:** Ask Sourcery to create an issue from a review comment by replying to it. - **Generate a pull request title:** Write `@sourcery-ai` anywhere in the pull request title to generate a title at any time. - **Generate a pull request summary:** Write `@sourcery-ai summary` anywhere in the pull request body to generate a PR summary at any time. You can also use this command to specify where the summary should be inserted. #### Customizing Your Experience Access your [dashboard](https://app.sourcery.ai) to: - Enable or disable review features such as the Sourcery-generated pull request summary, the reviewer's guide, and others. - Change the review language. - Add, remove or edit custom review instructions. - Adjust other review settings. #### Getting Help - [Contact our support team](mailto:support@sourcery.ai) for questions or feedback. - Visit our [documentation](https://docs.sourcery.ai) for detailed guides and information. - Keep in touch with the Sourcery team by following us on [X/Twitter](https://x.com/SourceryAI), [LinkedIn](https://www.linkedin.com/company/sourcery-ai/) or [GitHub](https://github.com/sourcery-ai).

hongxuzhou / LfD_final_Assignment

SVM Implementation. #14