inspirehep / refextract

Extract bibliographic references from (High-Energy Physics) articles.
GNU General Public License v2.0
130 stars 30 forks source link

TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType #101

Open JJery-web opened 1 year ago

JJery-web commented 1 year ago

Hello. My code is:

from refextract import extract_references_from_file import os

--- main ---

path="E:\finance Python\2022 business\1226 pdf\42_56\" name="test.pdf" file=name print(file) st = os.stat(file) print(st) references = extract_references_from_file(os.path.join(path, name)) print(references[0])

But unfortunately, I don't know why the path is an error. I also change the path to "test.pdf" but can not function. Please help!

JJery-web commented 1 year ago

I find this helps me. https://github.com/jalan/pdftotext/issues/16

All hope is not lost on the windows version. There is a command line utility with the same name and you can use the subprocess package to execute pdftotext

PDFtotext windows download instruction, credit @s2t2

Go to https://www.xpdfreader.com/download.html and click "Download the Xpdf tools" Uncompress/extract the zip file, and move the folder to a location like the Desktop or the Programs directory. Inside the unzipped folder, copy the file bin64/pdftotext.exe into your project repository