recodehive / Scrape-ML

For new data generation Semi-supervised-sequence-learning-Project we have writtern a python script to fetch📊, data from the 💻, imdb website 🌐 and converted into txt files.
https://scrape-ml.streamlit.app/
MIT License
86 stars 116 forks source link

Web scraping using OCR software . #18

Closed litesh1123 closed 2 months ago

litesh1123 commented 4 months ago

Problem - Data scrapped from the website would be insufficient and inaccurate with only usage of selenium and beautiful soap for scrapping.

solution- I propose the usage of OCR software called Tesseract , which would extract text from images , selenium has a feature where it can take couple of screenshots of website ,this will be given to tesseract for extraction.

Please assign me this issue

poojaverma9578 commented 4 months ago

I am eager to make a meaningful contribution to this project. Kindly assign me this tasks as it aligns with my skills and expertise.

sanjay-kv commented 4 months ago

assigned to you @litesh1123 Issue will be assigned to Only 1 person ...FCFS basis. Others get a chance if the issue gets stale & the assignee is inactive for 5+ days Also if you would like to work on more issue create a new issue and I will create label and assign you.

litesh1123 commented 4 months ago

@sanjay-kv thank you sir

lassmara commented 3 months ago

Please assign me this issue

github-actions[bot] commented 2 months ago

This issue has been automatically closed because it has been inactive for more than 30 days. If you believe this is still relevant, feel free to reopen it or create a new one. Thank you!