srbhr / Resume-Matcher

Resume Matcher is an open source, free tool to improve your resume. It works by using language models to compare and rank resumes with job descriptions.
https://www.resumematcher.fyi/
Apache License 2.0
4.76k stars 1.93k forks source link

Lookup error on 'extracted keywords from resume' #90

Closed thomassbooth closed 12 months ago

thomassbooth commented 12 months ago

Issue Title

Error occuring, looks like an import error. All readme steps followed properly.

Type

Description

After starting the streamlit server locally, on extracting keywords from the resume, it displays a lookup error.

Expected Behavior

The streamlit server to display the words extracted from the resume.

Current Behavior

An error is currently displaying.

Steps to Reproduce

  1. Clone repo
  2. Initialise venv
  3. install dependencies
  4. remove exsisting resumes and job descriptions.
  5. Add in new Job description .pdf and Resume .pdf
  6. Parse them
  7. start the streamlit server.

Screenshots / Code Snippets (if applicable)

Error: LookupError: ********************************************************************** Resource punkt not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('punkt')  For more information see: https://www.nltk.org/data.html Attempted to load tokenizers/punkt/PY3/english.pickle Searched in: - '/Users/thomasbooth/nltk_data' - '/Users/thomasbooth/Documents/Personal/Resume-Matcher/env/nltk_data' - '/Users/thomasbooth/Documents/Personal/Resume-Matcher/env/share/nltk_data' - '/Users/thomasbooth/Documents/Personal/Resume-Matcher/env/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' - '' ********************************************************************** Traceback: File "/Users/thomasbooth/Documents/Personal/Resume-Matcher/env/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script exec(code, module.__dict__) File "/Users/thomasbooth/Documents/Personal/Resume-Matcher/streamlit_app.py", line 160, in <module> annotated_text(create_annotated_text( ^^^^^^^^^^^^^^^^^^^^^^ File "/Users/thomasbooth/Documents/Personal/Resume-Matcher/streamlit_app.py", line 86, in create_annotated_text tokens = nltk.word_tokenize(input_string) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/thomasbooth/Documents/Personal/Resume-Matcher/env/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 129, in word_tokenize sentences = [text] if preserve_line else sent_tokenize(text, language) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/thomasbooth/Documents/Personal/Resume-Matcher/env/lib/python3.11/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize tokenizer = load(f"tokenizers/punkt/{language}.pickle") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/thomasbooth/Documents/Personal/Resume-Matcher/env/lib/python3.11/site-packages/nltk/data.py", line 750, in load opened_resource = _open(resource_url) ^^^^^^^^^^^^^^^^^^^ File "/Users/thomasbooth/Documents/Personal/Resume-Matcher/env/lib/python3.11/site-packages/nltk/data.py", line 876, in _open return find(path_, path + [""]).open() ^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/thomasbooth/Documents/Personal/Resume-Matcher/env/lib/python3.11/site-packages/nltk/data.py", line 583, in find raise LookupError(resource_not_found)

Environment

Possible Solution (if you have any in mind)

Additional Information

srbhr commented 12 months ago

I think you're missing the punkt tokenizer.

Install/Download NLTK Data.

  1. Run python in command line.

    ```bash
    python
    ```
  2. After that run.

    ```python
    import nltk
    nltk.download('punkt')
    ```
thomassbooth commented 12 months ago

Thanks:

I was getting an invalid ssl certificate error when downloading so had to use this (for anyone if they get the same error): https://stackoverflow.com/questions/38916452/nltk-download-ssl-certificate-verify-failed

thomassbooth commented 12 months ago

Nltk manual install needed.