recodehive / Scrape-ML

For new data generation Semi-supervised-sequence-learning-Project we have writtern a python script to fetchπŸ“Š, data from the πŸ’», imdb website 🌐 and converted into txt files.
https://scrape-ml.streamlit.app/
MIT License
108 stars 136 forks source link

[BUG] InvalidSchema Error when Fetching URLs in getSoup and NoneType Error When Parsing HTML in getReviewText #312

Closed shristirwt closed 3 weeks ago

shristirwt commented 3 weeks ago

Description

When running the code, the InvalidSchema error occurs in the getSoup function while trying to fetch URLs for IMDb movie reviews. This error suggests that some URLs may not be formatted correctly or are being generated as HTML content instead of valid URLs.

The getReviewText function may encounter an AttributeError if the movie_soup object is None or does not contain the expected HTML structure (e.g., no review text div). This can happen if getSoup returns None due to a failed request or an invalid URL. Screenshot 2024-11-03 192018

Screenshots

No response

Any additional information?

No response

What browser are you seeing the problem on?

No response

Checklist

github-actions[bot] commented 3 weeks ago

Thank you for creating this issue! πŸŽ‰ We'll look into it as soon as possible. In the meantime, please make sure to provide all the necessary details and context. If you have any questions or additional information, feel free to add them here. Your contributions are highly appreciated! 😊

You can also check our CONTRIBUTING.md for guidelines on contributing to this project.

github-actions[bot] commented 3 weeks ago

Thank you for raising a issue, Hope you enjoing the open source. we try to reply or assign as soon possibe. Connect with mentor.

github-actions[bot] commented 3 weeks ago

Hello @shristirwt! Your issue #312 has been closed. Thank you for your contribution!