ongteckwu / Resume-Rater

Rates the quality of a candidate based on his/her resume using unsupervised approaches
96 stars 45 forks source link

Trying to test run and train #1

Open sourcelead opened 5 years ago

sourcelead commented 5 years ago

Getting following error during testing and training with pdf files

python3 main.py --type fixed "./src/data/test/Dong Xing_Catherine Zhang_Equity Research Intern.pdf" --model_name model Loading nlp tools... Loading pdf parser... 2019-06-13 12:32:38,162 [MainThread ] [WARNI] Tika server returned status: 500 Traceback (most recent call last): File "main.py", line 101, in r.test(path_to_resume, infoExtractor) File "/media/Shared/resumeRat/Resume-Rater-master/src/model.py", line 568, in test doc, = loadDocumentIntoSpacy(filename, self.parser, self.nlp) File "/media/Shared/resume_Rat/Resume-Rater-master/src/utils.py", line 162, in loadDocumentIntoSpacy new_text = getPDFText(f, parser) File "/media/Shared/resume_Rat/Resume-Rater-master/src/utils.py", line 144, in getPDFText raw = parser.from_file(filename) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/tika/parser.py", line 40, in from_file return _parse(jsonOutput) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/tika/parser.py", line 77, in _parse realJson = json.loads(jsonOutput[1]) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/init.py", line 354, in loads return _default_decoder.decode(s) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

ongteckwu commented 5 years ago

i think it's a Tika parser problem. I did not want to use Tika because of the need to interface with Java but sadly other methods require a lot of dependencies. I think you can try restarting your Tika server or maybe upgrade your Python to 3.7.

Also, Tika requires the Internet (unfortunately) so it is possible you might have not connected to Apache Tika.