Open andr-agus opened 10 months ago
Hi @andr-agus,
may I ask which version of papers, pdftotext, operating system etc. you use? For me the paper you link works fine.
> pip install -U papers-cli
...
> papers --version
2.4
> pdftotext -h
pdftotext version 22.02.0
...
> papers extract s41567-020-0990-x.pdf
@article{Bong_2020,
doi = {10.1038/s41567-020-0990-x},
url = {https://doi.org/10.1038%2Fs41567-020-0990-x},
year = 2020,
month = {aug},
publisher = {Springer Science and Business Media {LLC}},
volume = {16},
number = {12},
pages = {1199--1205},
author = {Kok-Wei Bong and An{\'{\i}}bal Utreras-Alarc{\'{o}}n and Farzad Ghafari and Yeong-Cherng Liang and Nora Tischler and Eric G. Cavalcanti and Geoff J. Pryde and Howard M. Wiseman},
title = {A strong no-go theorem on the Wigner's friend paradox},
journal = {Nature Physics}
}
Thanks for the good vibes. Mahé
PS:
> papers extract s41567-020-0990-x.pdf --debug
DEBUG:papers:read pdf page: 1
INFO:papers:pdftotext -f 1 -l 1 s41567-020-0990-x.pdf /tmp/tmp_fgh87__.txt
...
> pdftotext -f 1 -l 1 s41567-020-0990-x.pdf out1.txt
... all fine ...
> pdftotext -f 2 -l 2 s41567-020-0990-x.pdf out2.txt
... all fine ... (this is the command from your log)
So I assume the issue is with your version of pdftotext. Is it too old or too new or ???
Hi,
I've been using 'papers' for quite a while now and this is the first time I've seen this issue. I am trying to extract the bilbiographic info of this article* from its pdf. The program throws this exception:
_Command Line Error: Wrong page range given: the first page (2) can not be after the last page (1). Traceback (most recent call last): File "/usr/bin/papers", line 8, in
sys.exit(main())
^^^^^^
File "/usr/lib/python3.11/site-packages/papers/main.py", line 1091, in main
extractcmd(subp, o)
File "/usr/lib/python3.11/site-packages/papers/main.py", line 546, in extractcmd
print(extract_pdf_metadata(o.pdf, search_doi=not o.fulltext, search_fulltext=True, scholar=o.scholar, minwords=o.word_count, max_query_words=o.word_count, image=o.image))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/papers/extract.py", line 208, in extract_pdf_metadata
txt = pdfhead(pdf, maxpages, minwords, image=image)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/papers/extract.py", line 134, in pdfhead
txt += readpdf(pdf, first=i, last=i)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/papers/extract.py", line 41, in readpdf
sp.check_call(cmd)
File "/usr/lib/python3.11/subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['pdftotext', '-f', '2', '-l', '2', 'paper.pdf', '/tmp/tmpaq14gv5.txt']' returned non-zero exit status 99.
Apparently, 'papers' is calling 'pdftotext' with arguments that make no sense, so, what is making 'papers' get confused about those arguments?
(Have I mentioned how much I like this program? Cheers!)
*https://www.nature.com/articles/s41567-020-0990-x