Open sourcelead opened 5 years ago
i think it's a Tika parser problem. I did not want to use Tika because of the need to interface with Java but sadly other methods require a lot of dependencies. I think you can try restarting your Tika server or maybe upgrade your Python to 3.7.
Also, Tika requires the Internet (unfortunately) so it is possible you might have not connected to Apache Tika.
Getting following error during testing and training with pdf files
python3 main.py --type fixed "./src/data/test/Dong Xing_Catherine Zhang_Equity Research Intern.pdf" --model_name model Loading nlp tools... Loading pdf parser... 2019-06-13 12:32:38,162 [MainThread ] [WARNI] Tika server returned status: 500 Traceback (most recent call last): File "main.py", line 101, in
r.test(path_to_resume, infoExtractor)
File "/media/Shared/resumeRat/Resume-Rater-master/src/model.py", line 568, in test
doc, = loadDocumentIntoSpacy(filename, self.parser, self.nlp)
File "/media/Shared/resume_Rat/Resume-Rater-master/src/utils.py", line 162, in loadDocumentIntoSpacy
new_text = getPDFText(f, parser)
File "/media/Shared/resume_Rat/Resume-Rater-master/src/utils.py", line 144, in getPDFText
raw = parser.from_file(filename)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/tika/parser.py", line 40, in from_file
return _parse(jsonOutput)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/tika/parser.py", line 77, in _parse
realJson = json.loads(jsonOutput[1])
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/init.py", line 354, in loads
return _default_decoder.decode(s)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)