recodehive / Scrape-ML

For new data generation Semi-supervised-sequence-learning-Project we have writtern a python script to fetch📊, data from the 💻, imdb website 🌐 and converted into txt files.
https://scrape-ml.streamlit.app/
MIT License
85 stars 115 forks source link

Update clustering movie review.ipynb #141

Closed Chandrikajoshi123 closed 3 months ago

Chandrikajoshi123 commented 3 months ago

Added NLTK Stopwords, PCA for dimensionality reduction, and Visualization with Matplotlib.

Related Issue

[Cite any related issue(s) this pull request addresses. If none, simply state “None”]

Description

This Pull request enhance the existing codebase by incorporating several new libraries and functionalities to improve text processing and visualization. Key changes include:

1. NLTK Stopwords:

Added the 'nltk.corpus' Stopwords to remove common, non-informative words from text data, enhancing the quality of text analysis.

2. Principal Component Analysis (PCA):

Integrated PCA for dimensionality reduction, allowing for better handling of high dimensional data and facilitating more effective clustering.

3. Matplotlib:

Added 'matplotlib' for visualizing the clustering results. The clusters are projected onto a 2D plane using PCA, providing a clear visual representation of data structure.

These additions will help streamline text preprocessing, reduce computational complexity, and provide insightful visualization of clustering outcomes, aiding in better data interpretation and decision-making.

Type of PR

Screenshots / videos (if applicable)

[Attach any relevant screenshots or videos demonstrating the changes]

Checklist:

Chandrikajoshi123 commented 3 months ago

We want you to add in the same imdb ipynb.

The problem is indivtiual ipynb for all issue creates leads scattered data. sorry about RC

okay thanks for informing

github-actions[bot] commented 3 months ago

This PR has been automatically closed due to inactivity from the owner for 15 days.