recodehive / Scrape-ML

For new data generation Semi-supervised-sequence-learning-Project we have writtern a python script to fetch📊, data from the 💻, imdb website 🌐 and converted into txt files.
https://scrape-ml.streamlit.app/
MIT License
104 stars 133 forks source link

Improving review scraping and movie recommendation system of IMDb #69

Closed prernasahuu closed 3 months ago

prernasahuu commented 5 months ago

Describe the bug The current review scraping process in the IMDb ratings system exhibits certain inaccuracies, particularly in distinguishing between valid and fake user actions. Addressing this issue is crucial for maintaining the integrity of the review data. Additionally, IMDb’s movie recommendation system requires enhancement to provide more precise and personalized recommendations based on user preferences. I goal for this project to improve the recommendation system using various machine learning techniques and Python programming, while also implementing a robust classification mechanism to differentiate between genuine and fraudulent reviews. The final results will be exported in CSV format for further analysis To Reproduce Steps to reproduce the behavior: Scraping Reviews: 1.Using Python libraries like BeautifulSoup to load IMDb pages.

  1. Find movie links on IMDb.
  2. Extract user reviews from the movie pages.
  3. Store the scraped data in a suitable format (e.g., CSV).
    • Enhance Movie Recommendation System:
  4. Preprocess datasets using Pandas.
  5. Explore content-based and collaborative filtering techniques.
  6. Calculate similarity metrics (e.g., cosine similarity) to recommend similar movies.

Expected behavior I have expected to fix the bugs and make the review scraping system precision and accuray rate to be more high.

Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional context we can also add features like: advertising the better movie option for the user with bad experience. also, to conduct some interactive sessions to seek more attention and empower the IMDb promotions. the sessions can be like: movie quizs, riddles, funny facts and myth brusters etc.. just to have more interaction with users and make users participation higher. PLEASE PULL UP THE REQUEST FOR THE PROJECT. OPEN TO ANY SUGGESTION OR IDEAS.

I am a Contributor in GSSoc'24

github-actions[bot] commented 5 months ago

Thank you for raising a issue, Hope you enjoing the open source. we try to reply or assign as soon possibe. Connect with mentor.

prernasahuu commented 5 months ago

PLEASE DO ASSIGN ME THE PROJECT AS I AM A CONTRIBUTOR IN GSSoC'24.

prernasahuu commented 5 months ago

thank you for assigning this project.

itssiddhantjain commented 5 months ago

Hey @sanjay-kv, can you please assign me this issue?

github-actions[bot] commented 3 months ago

This issue has been automatically closed because it has been inactive for more than 30 days. If you believe this is still relevant, feel free to reopen it or create a new one. Thank you!