kbrajwani / resume_parser

MIT License
104 stars 67 forks source link

Can't read config.cfg #1

Open Bhavin1996 opened 3 years ago

Bhavin1996 commented 3 years ago

OSError: [E053] Could not read config.cfg from C:\Users\bhavi\AppData\Local\Programs\Python\Python39\lib\site-packages\resume_parser\degree\model\config.cfg

kbrajwani commented 3 years ago

Hey , make sure you have installed correct spacy==2.3.5 and en_core_web_sm==2.3.1 version. https://colab.research.google.com/drive/1p6rhi9g0ughtGBojnCJcPVRRNqziuk3K?usp=sharing see this colab notebook.

diracsol commented 3 years ago

I have run python -m spacy validate and confirmed that spacy version 2.3.5 and en_core_web_sm version is 2.3.1 When I run from resumeparser run resumeparse I get user warning [w031] message that says Model 'en_training' (0.0.0) requires spacy 2.2 and is incompatible with spacy 2.3.5

Jeyandranath commented 3 years ago

Hey , make sure you have installed correct spacy==2.3.5 and en_core_web_sm==2.3.1 version. https://colab.research.google.com/drive/1p6rhi9g0ughtGBojnCJcPVRRNqziuk3K?usp=sharing see this colab notebook.

I too encounter this issue. Yes it works fine in Colab, along with some warning but when I run on my Ubuntu server, with the warning it get struck.

kbrajwani commented 3 years ago

Hey @Jeyandranath , can you please share some logs from where the process stuck. Also can you share the resume on which it stuck.

Jeyandranath commented 3 years ago

Hey , make sure you have installed correct spacy==2.3.5 and en_core_web_sm==2.3.1 version. https://colab.research.google.com/drive/1p6rhi9g0ughtGBojnCJcPVRRNqziuk3K?usp=sharing see this colab notebook.

I too encounter this issue. Yes it works fine in Colab, along with some warning but when I run on my Ubuntu server, with the warning it get struck.

Tested in Windows, Works fine with the warning below : UserWarning: [W031] Model 'en_training' (0.0.0) requires spaCy v2.2 and is incompatible with the current spaCy version (2.3.5). This may lead to unexpected results or runtime errors. To resolve this, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate warnings.warn(warn_msg)

data = resumeparse.read_file('hello.pdf') 2021-03-21 00:40:45,448 [MainThread ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.24/tika-server-1.24.jar to C:\Users\CHARUJ~1\AppData\Local\Temp\tika-server.jar. 2021-03-21 00:41:16,323 [MainThread ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.24/tika-server-1.24.jar.md5 to C:\Users\CHARUJ~1\AppData\Local\Temp\tika-server.jar.md5. 2021-03-21 00:41:19,471 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2021-03-21 00:41:24,493 [MainThread ] [WARNI] Failed to see startup log message; retrying... print(data) {'email': 'bshravan85@hotmail.com', 'phone': '+91-98845-92980', 'name': 'SHRAVAN KUMAR', 'total_exp': 4, 'university': [], 'designition': ['finance analyst', 'operations tech', 'deputy manager'], 'degree': ['B.Com Degree'], 'skills': ['Known: Tamil', ' English', ' and Tulu', 'Present Address: 22 Vijayalakshmi Avenue', 'Poonamallee', ' Chennai-56'], 'Companies worked at': ['92980', 'SAP', 'Hyundai Motor India Ltd', 'Hyundai Motor India Ltd.']}

Jeyandranath commented 3 years ago

Hey @Jeyandranath , can you please share some logs from where the process stuck. Also can you share the resume on which it stuck. After this Warning in Ubuntu: hello.pdf

UserWarning: [W031] Model 'en_training' (0.0.0) requires spaCy v2.2 and is incompatible with the current spaCy version (2.3.5). This may lead to unexpected results or runtime errors. To resolve this, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate warnings.warn(warn_msg)

Jeyandranath commented 3 years ago

I think Java is the issue...

RohitJacob commented 3 years ago

There is no file in the path resume_parser\degree\model\ called config.cfg - even on the github repository. What are the contents of the config.cfg?

GuidoBartoli commented 3 years ago

Yep, same problem here within a Python 3.8 virtual environment (I followed the official installation instructions from here):

>>> from resume_parser import resumeparse
/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/util.py:715: UserWarning: [W094] Model 'en_training' (0.0.0) specifies an under-constrained spaCy version requirement: >=2.2.4. This can lead to compatibility problems with older versions, or as new spaCy versions are released, because the model may say it's compatible when it's not. Consider changing the "spacy_version" in your meta.json to a version range, with a lower and upper pin. For example: >=3.0.5,<3.1.0
  warnings.warn(warn_msg)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/resume_parser/__init__.py", line 1, in <module>
    from resume_parser.resumeparse import resumeparse
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/resume_parser/resumeparse.py", line 50, in <module>
    custom_nlp2 = spacy.load(os.path.join(base_path,"degree","model"))
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/__init__.py", line 47, in load
    return util.load_model(name, disable=disable, exclude=exclude, config=config)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/util.py", line 324, in load_model
    return load_model_from_path(Path(name), **kwargs)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/util.py", line 388, in load_model_from_path
    config = load_config(config_path, overrides=dict_to_dot(config))
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/util.py", line 545, in load_config
    raise IOError(Errors.E053.format(path=config_path, name="config.cfg"))
OSError: [E053] Could not read config.cfg from /home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/resume_parser/degree/model/config.cfg

That config file does not actually exist in that position, but if it is located in another position, I can move it there. Where it is and what should it contain?

GuidoBartoli commented 3 years ago

After some experiments, I managed to find the config.cfg file inside my virtual environment (it was located inside ~/.virtualenvs/rsm/lib/python3.8/site-packages/en_core_web_sm/en_core_web_sm-3.0.0), so I copied it to the folder required by resume_parser, so the previous error was solved, but another one appears:

>>> from resume_parser import resumeparse
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/resume_parser/__init__.py", line 1, in <module>
    from resume_parser.resumeparse import resumeparse
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/resume_parser/resumeparse.py", line 50, in <module>
    custom_nlp2 = spacy.load(os.path.join(base_path,"degree","model"))
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/__init__.py", line 47, in load
    return util.load_model(name, disable=disable, exclude=exclude, config=config)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/util.py", line 324, in load_model
    return load_model_from_path(Path(name), **kwargs)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/util.py", line 390, in load_model_from_path
    return nlp.from_disk(model_path, exclude=exclude)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/language.py", line 1863, in from_disk
    util.from_disk(path, deserializers, exclude)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/util.py", line 1174, in from_disk
    reader(path / key)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/language.py", line 1849, in <lambda>
    deserializers["tokenizer"] = lambda p: self.tokenizer.from_disk(
  File "spacy/tokenizer.pyx", line 740, in spacy.tokenizer.Tokenizer.from_disk
  File "spacy/tokenizer.pyx", line 803, in spacy.tokenizer.Tokenizer.from_bytes
  File "spacy/tokenizer.pyx", line 570, in spacy.tokenizer.Tokenizer._load_special_cases
  File "spacy/tokenizer.pyx", line 589, in spacy.tokenizer.Tokenizer._validate_special_case
ValueError: [E1005] Unable to set attribute 'POS' in tokenizer exception for '  '. Tokenizer exceptions are only allowed to specify ORTH and NORM.

This is harder to understand... do you have any suggestions?

kbrajwani commented 3 years ago

Hey please make sure your requirements are matched like this spacy==2.3.5 and en_core_web_sm==2.3.1 . config.cfg is spacy configuration file it will be downloaded when we install en_core_web_sm package. I will try to update model as i get some time. Thanks

ranyaphat29 commented 3 years ago

I have the same problems like this and I've installed library following requirements but its doesn't work for me.

bharath-ts commented 3 years ago

I have faced the same issue of runtime stuck while importing resume_parser (with spacy 2.3.5 and en_core_web_sm 2.3.1). Even the colab notebook also got stuck at same code execution. Could you fix this issue or let us know what is the reason for this issue?

1zineb commented 3 years ago

After some experiments, I managed to find the config.cfg file inside my virtual environment (it was located inside ~/.virtualenvs/rsm/lib/python3.8/site-packages/en_core_web_sm/en_core_web_sm-3.0.0), so I copied it to the folder required by resume_parser, so the previous error was solved, but another one appears:

>>> from resume_parser import resumeparse
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/resume_parser/__init__.py", line 1, in <module>
    from resume_parser.resumeparse import resumeparse
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/resume_parser/resumeparse.py", line 50, in <module>
    custom_nlp2 = spacy.load(os.path.join(base_path,"degree","model"))
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/__init__.py", line 47, in load
    return util.load_model(name, disable=disable, exclude=exclude, config=config)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/util.py", line 324, in load_model
    return load_model_from_path(Path(name), **kwargs)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/util.py", line 390, in load_model_from_path
    return nlp.from_disk(model_path, exclude=exclude)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/language.py", line 1863, in from_disk
    util.from_disk(path, deserializers, exclude)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/util.py", line 1174, in from_disk
    reader(path / key)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/language.py", line 1849, in <lambda>
    deserializers["tokenizer"] = lambda p: self.tokenizer.from_disk(
  File "spacy/tokenizer.pyx", line 740, in spacy.tokenizer.Tokenizer.from_disk
  File "spacy/tokenizer.pyx", line 803, in spacy.tokenizer.Tokenizer.from_bytes
  File "spacy/tokenizer.pyx", line 570, in spacy.tokenizer.Tokenizer._load_special_cases
  File "spacy/tokenizer.pyx", line 589, in spacy.tokenizer.Tokenizer._validate_special_case
ValueError: [E1005] Unable to set attribute 'POS' in tokenizer exception for '    '. Tokenizer exceptions are only allowed to specify ORTH and NORM.

This is harder to understand... do you have any suggestions?

I have the same issue . Do you have any suggestions please?

kbrajwani commented 3 years ago

I have faced the same issue of runtime stuck while importing resume_parser (with spacy 2.3.5 and en_core_web_sm 2.3.1). Even the colab notebook also got stuck at same code execution. Could you fix this issue or let us know what is the reason for this issue?

i have also encounter this. can you please check in local by installing the same way installation done in colab. i will solve it as i will get time.

kbrajwani commented 3 years ago

Hey guys, i have solved it in colab notebook . If you want to install it in local please follow the steps below.

  1. Create a new python environment https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/
  2. Install library pip install resume-parser
  3. Install en_core_web_sm pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz
  4. Install importlib-metadata pip install importlib-metadata==3.2.0

Now you can use the library.

sz332 commented 2 years ago

I had some issues to understand correctly the steps, so here are my additions to @kbrajwani -s comments.

  1. Follow his description
  2. From python you MUST execute the nltk.download() commands which will download the necessary data. This is something I totally missed.
  3. Install java on the machine. The library uses apache tika, which is written in java and is able to get the content from a pdf file very nicely so parsing will be more more efficient.
  4. Try to use python 3.8, I had some issues with 3.9 and 3.10
  5. Try to use linux. On windows, I had compilation issues.
kbrajwani commented 2 years ago

Thanks @sz332 For sharing your experience.