resume parser - Githubissues

rishi23root / resume-curator

Empowers users to create custom resume or cover-leter latex templates using Python 💻

GNU Affero General Public License v3.0

1 stars 0 forks source link

resume parser #31

Closed rishi23root closed 12 months ago

rishi23root commented 1 year ago

Aim: Take input of resume pdf or doc file parse it, and return json resume format in return with at least 70-80% accuracy (more is always better)

Steps to be used in the Production server

For now, we expect it just to be a function, and the PDF file is coming from a local storage (for testing use output folder files)

[ ] load the file and read the file
[ ] extract just text from the file
[ ] load a pre-trained model to parse the data and return in JSON format (expecting all the fields to be present, all unprocessed fields are left empty)
[ ] if not (3) then fill the JSON with the remaining missing fields
[ ] return the data

In the future, this function will be linked to an API endpoint

the file will be streamed and stored in the temp folder and read in realtime
need to create an API endpoint

For model Development

use a folder named parser in the root of the project
will be used to store parser code and training data and out model
and all the helper functions related to parsing
and also the wrapper for the build model and its usable instance provider

Constraints

should be in Python
regex base parsers are just reliable but can be used with the combination of some nlp or nn model

Resources to use (but not seem promising)

Google search top results 2nd link - using 3rd party API which is paid for full access
hugging face filted - seems promising
open resume - regex base and not python
keggle repos - doesn't seem to meet the requirements, maybe need more drilling
python lib - doesn't seem to meet requirements, but still best one so far
The last option custom train model, which can use - https://spacy.io/