inspirehep / refextract

Extract bibliographic references from (High-Energy Physics) articles.
GNU General Public License v2.0
130 stars 30 forks source link

TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType #80

Open tiffsea opened 3 years ago

tiffsea commented 3 years ago

I get the following error when trying out the example code from the refextract docs. I will explain my system below.

Error: TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Installation Used

I have used pip install refextract via terminal on MacOS Version 10.11.6 (15G22010). I have success with the installation although I did have to manually install libmagic using brew install libmagic as I was getting an error inially.

Usage Used

I tried first,

from refextract import extract_references_from_file references = extract_references_from_file('some-local-filename.pdf') print(references)

and got the following error:

TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Then, similar to the example code from the docs, I changed the code to,

from refextract import extract_references_from_file references = extract_references_from_file('https://arxiv.org/pdf/1503.07589.pdf') print(references)

which is the same error - TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

michamos commented 3 years ago

Hi, it would be useful if you could provide the full traceback, rather than only the error at the end.

walter-hernandez commented 3 years ago

@tiffsea refextract uses pdftotext in the background. The error seems to be because refextract cannot find pdftotext installed in your system. Try installing it following the instructions for os dependencies here:

https://pypi.org/project/pdftotext/

and installing pdftotext: pip install pdftotext

as well as: conda install -c conda-forge poppler

The above solved the issue for me

robinjacobroy commented 3 years ago

@tiffsea To my limited knowledge, pip install pdftotext installs some other package, which is different from what is needed here (correct me if i am wrong). pdftotext(1) version 3.00 is to be installed for refextract. So, i installed XpdfReader instead (https://www.xpdfreader.com/pdftotext-man.html) using the commands:

wget http://security.ubuntu.com/ubuntu/pool/main/p/poppler/libpoppler73_0.62.0-2ubuntu2.12_amd64.deb
sudo apt-get install ./libpoppler73_0.62.0-2ubuntu2.12_amd64.deb 
wget http://archive.ubuntu.com/ubuntu/pool/universe/x/xpdf/xpdf_3.04-7_amd64.deb 
sudo apt-get install ./xpdf_3.04-7_amd64.deb 

(ref: https://askubuntu.com/questions/1245518/how-to-install-xpdf-on-ubuntu-20-04)

The above solved the issue for me.