recodehive / Scrape-ML

For new data generation Semi-supervised-sequence-learning-Project we have writtern a python script to fetch📊, data from the 💻, imdb website 🌐 and converted into txt files.
https://scrape-ml.streamlit.app/
MIT License
85 stars 116 forks source link

Increasing Ocr accuracy through image preprocessing and image segmentation #199

Closed litesh1123 closed 2 months ago

litesh1123 commented 2 months ago

Related Issue

[Cite any related issue(s) this pull request addresses. If none, simply state “None”]

97

Description

[Please include a brief description of the changes or features added] added image preprocessing:- before undergoing OCR extraction , users can choose functions like greyscale, threshold, adaptive threshold and denoise where user can see manipulation of image and extracted text in output section. added image segmentation:- Images would be divided into parts for ROI (region of interest) and text will be extracted from the divided parts.

Type of PR

Screenshots / videos (if applicable)

[Attach any relevant screenshots or videos demonstrating the changes] Demo video link- https://drive.google.com/file/d/1phb0gmf1UlSvp7tU80azUM3PQH9Wo3f1/view?usp=sharing Screenshot 2024-06-21 143429 Screenshot 2024-06-21 143532 Screenshot 2024-06-21 143726

Checklist:

Additional context:

[Include any additional information or context that might be helpful for reviewers.] @sanjay-kv sir another guy who raised the issue I tried to contact him but no reply,i have reviewed and added necessary changes for increasing OCR accuracy.

sanjay-kv commented 2 months ago

image looks like need to pull before pushing

litesh1123 commented 2 months ago

@sanjay-kv I didn't notice how these things were changed Now what shall I do next sir. I didn't change anything in these files

sanjay-kv commented 2 months ago

thats your file right, need to remove those files from the repo. and try push

github-actions[bot] commented 2 months ago

This PR has been automatically closed due to inactivity from the owner for 15 days.