An interactive web application developed with Streamlit, designed for making predictions using various machine learning models. The app dynamically generates forms and pages from JSON configuration files. â If you found this helpful, consider starring the repo!
ð Problem Description:
Text Extractor from Images : An automated text extraction model will help in converting images containing text into a machine-readable format, making information easily searchable, editable, and usable in other applications. This is particularly useful in scenarios like document digitization, data extraction for research, and automating data entry.
ð§ Model Description:
The model will use Optical Character Recognition (OCR), leveraging a deep learning-based approach such as Tesseract OCR, which has strong capabilities for recognizing text in various fonts, languages, and orientations. To enhance accuracy, pre-processing techniques (like resizing, thresholding, or noise reduction) to prepare images for optimal text extraction will be applied.
â²ïž Estimated Time for Completion:
! day, mostly by tomorrow morning(ie : 10th October)
ð¯ Expected Outcome:
Converting images like :
to : text :
Textual Conventions (I)
MediumType, MediumAddress
ethernet(7), tokenring(9), fddi(15)
PeerType, PeerAddress
ipv4(1), ipv6(2), nsap(3), ipx(11), appletalk(12), decnet(13)
AdjacentType, AdjacentAddress
A superset of MediumType and PeerType
RTFM WG 3
The University of Auckland
ð Additional Context:
This model will primarily serve individuals and businesses looking to automate document workflows or data entry processes.
To be Mentioned while taking the issue:
A contributor at GGSOC.
Note:
Please review the project documentation and ensure your code aligns with the project structure.
Please ensure that either the predict.py file includes a properly implemented model_details() function or the notebook contains this function to print a detailed model report. The model will not be accepted without this function in place, as it is essential for generating the necessary model details.
Prefer using a new branch to resolve the issue, as it helps keep the main branch stable and makes it easier to manage and review your changes.
Strictly use the pull request template provided in the repository to create a pull request.
ð Problem Description: Text Extractor from Images : An automated text extraction model will help in converting images containing text into a machine-readable format, making information easily searchable, editable, and usable in other applications. This is particularly useful in scenarios like document digitization, data extraction for research, and automating data entry.
ð§ Model Description: The model will use Optical Character Recognition (OCR), leveraging a deep learning-based approach such as Tesseract OCR, which has strong capabilities for recognizing text in various fonts, languages, and orientations. To enhance accuracy, pre-processing techniques (like resizing, thresholding, or noise reduction) to prepare images for optimal text extraction will be applied.
â²ïž Estimated Time for Completion: ! day, mostly by tomorrow morning(ie : 10th October)
ð¯ Expected Outcome: Converting images like :
to : text :
Textual Conventions (I)
MediumType, MediumAddress ethernet(7), tokenring(9), fddi(15) PeerType, PeerAddress ipv4(1), ipv6(2), nsap(3), ipx(11), appletalk(12), decnet(13) AdjacentType, AdjacentAddress A superset of MediumType and PeerType
RTFM WG 3 The University of Auckland
ð Additional Context: This model will primarily serve individuals and businesses looking to automate document workflows or data entry processes.
To be Mentioned while taking the issue:
Note:
predict.py
file includes a properly implementedmodel_details()
function or the notebook contains this function to print a detailed model report. The model will not be accepted without this function in place, as it is essential for generating the necessary model details.