Aim: Take input of resume pdf or doc file parse it, and return json resume format in return with at least 70-80% accuracy (more is always better)
Steps to be used in the Production server
For now, we expect it just to be a function, and the PDF file is coming from a local storage (for testing use output folder files)
[ ] load the file and read the file
[ ] extract just text from the file
[ ] load a pre-trained model to parse the data and return in JSON format (expecting all the fields to be present, all unprocessed fields are left empty)
[ ] if not (3) then fill the JSON with the remaining missing fields
[ ] return the data
In the future, this function will be linked to an API endpoint
the file will be streamed and stored in the temp folder and read in realtime
need to create an API endpoint
For model Development
use a folder named parser in the root of the project
will be used to store parser code and training data and out model
and all the helper functions related to parsing
and also the wrapper for the build model and its usable instance provider
Constraints
should be in Python
regex base parsers are just reliable but can be used with the combination of some nlp or nn model
Aim: Take input of resume pdf or doc file parse it, and return json resume format in return with at least 70-80% accuracy (more is always better)
Steps to be used in the Production server
For now, we expect it just to be a function, and the PDF file is coming from a local storage (for testing use output folder files)
In the future, this function will be linked to an API endpoint
For model Development
parser
in the root of the projectConstraints
Resources to use (but not seem promising)