perrette / papers

Command-line tool to manage bibliography (pdfs + bibtex)
MIT License
141 stars 22 forks source link

papers --add fails if in a subdir? #64

Open boyanpenkov opened 1 month ago

boyanpenkov commented 1 month ago

I see

papers add 2013_AdvCIS_Modeling\ and\ simulation\ of\ electrostatically\ gated\ nanochannels.pdf --rename --copy --info         
INFO:papers:bibtex: '/home/boyan/Vazhno/Work/Literature/library.bib'
INFO:papers:filesdir: '/home/boyan/Vazhno/Work/Literature/papers_organized'
INFO:papers:8036 entry files were updated
INFO:papers:pdftotext -f 1 -l 1 2013_AdvCIS_Modeling and simulation of electrostatically gated nanochannels.pdf /tmp/tmppsa0k9ff.txt
INFO:papers:found doi:10.1016/j.cis.2013.06.006
INFO:papers:duplicate :: update key to match existing entry: 2013/2013_pardon_van-der-wijngaart_modeling-and-simulation-of-electrostatically-gated-nanochannels => Pardon2013
Traceback (most recent call last):
  File "/home/boyan/miniconda3/envs/python/bin/papers", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/__main__.py", line 1071, in main
    check_install(subp, o, config) and addcmd(subp, o, config)
                                       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/__main__.py", line 452, in addcmd
    biblio.add_pdf(file, attachments=o.attachment, rename=o.rename, copy=o.copy,
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/bib.py", line 432, in add_pdf
    self.insert_entry(entry, update_key=True, **kw)
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/bib.py", line 288, in insert_entry
    self.insert_entry_check(entry, update_key=update_key, rename=rename, copy=copy, **checkopt)
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/bib.py", line 345, in insert_entry_check
    file = merge_files([candidate, entry], relative_to=self.relative_to)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/duplicate.py", line 290, in merge_files
    check = checksum(f) if os.path.exists(f) else None
            ^^^^^^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/utils.py", line 81, in checksum
    return hash_bytestr_iter(file_as_blockiter(open(fname, 'rb')), hashlib.sha256())
                                               ^^^^^^^^^^^^^^^^^
IsADirectoryError: [Errno 21] Is a directory: '/home/boyan/Vazhno/Work/Literature'

The pdf itself is OK, I think -- this is the right PDF metadata after the add.

papers extract 2013_AdvCIS_Modeling\ and\ simulation\ of\ electrostatically\ gated\ nanochannels.pdf                    
@article{Pardon_2013, title={Modeling and simulation of electrostatically gated nanochannels}, volume={199–200}, ISSN={0001-8686}, url={http://dx.doi.org/10.1016/j.cis.2013.06.006}, DOI={10.1016/j.cis.2013.06.006}, journal={Advances in Colloid and Interface Science}, publisher={Elsevier BV}, author={Pardon, G. and van der Wijngaart, W.}, year={2013}, month=nov, pages={78–94} }

My papers is installed, and the config is:

(python) → working Literature/Stage cat ~/.local/share/config.json                                                                                                   
{
  "absolute_paths": true,
  "backup_files": false,
  "bibtex": "/home/boyan/Vazhno/Work/Literature/library.bib",
  "editor": null,
  "filesdir": "/home/boyan/Vazhno/Work/Literature/papers_organized",
  "git": true,
  "gitdir": "/home/boyan/.local/share",
  "gitlfs": true,
  "keyformat": {
    "author_num": 2,
    "author_sep": "_",
    "template": "{year}/{year}_{author}_{title}",
    "title_length": 100,
    "title_sep": "-",
    "title_word_num": 100,
    "title_word_size": 1
  },
  "local": false,
  "nameformat": {
    "author_num": 2,
    "author_sep": "_",
    "template": "{authorX}_{year}_{title}",
    "title_length": 100,
    "title_sep": "-",
    "title_word_num": 100,
    "title_word_size": 1
  }
}

Note that if I switch to the {journal} tag in the config ( by doing "template": "{journal}/{year}{author}{title}") which should be supported, as {journal} is a valid BibTex field, I get

INFO:papers:bibtex: '/home/boyan/Vazhno/Work/Literature/library.bib'
INFO:papers:filesdir: '/home/boyan/Vazhno/Work/Literature/papers_organized'
INFO:papers:8036 entry files were updated
INFO:papers:pdftotext -f 1 -l 1 2013_AdvCIS_Modeling and simulation of electrostatically gated nanochannels.pdf /tmp/tmp1if0wmzn.txt
INFO:papers:found doi:10.1016/j.cis.2013.06.006
Traceback (most recent call last):
  File "/home/boyan/miniconda3/envs/python/bin/papers", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/__main__.py", line 1071, in main
    check_install(subp, o, config) and addcmd(subp, o, config)
                                       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/__main__.py", line 452, in addcmd
    biblio.add_pdf(file, attachments=o.attachment, rename=o.rename, copy=o.copy,
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/bib.py", line 427, in add_pdf
    entry['ID'] = self.generate_key(entry)
                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/bib.py", line 367, in generate_key
    key = self.keyformat(entry)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/filename.py", line 108, in __call__
    return self.render(**entry)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/filename.py", line 105, in render
    return stringify_entry(entry, **vars(self))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/filename.py", line 68, in stringify_entry
    res = template.format(**fields)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'journal'

I can refile this as two issues, but am I calling "add" correctly? The behavior I expect is to have the PDF renamed and moved, and the entry added to the end of library.bib.

perrette commented 1 month ago

Hmm, your command looks correct. Which version of papers are you using? Lately I have been using the pr-perfect-undo branch, which I have wanted to merge into master for a while now. I have fixed a few bugs in that branch. Do you mind trying it and see if you still have the issue? Note there are also some tiny differences between "local" and "global" install. Can you try papers status -v ?

perrette commented 1 month ago

Regarding the file-formatting issue, not all fields are available. So far only author, title, year and ID, in various formatting options. Here an example:

{'author': 'oelsmann_passaro', 'Author': 'Oelsmann_Passaro', 'AUTHOR': 'OELSMANN_PASSARO', 'authorX': 'oelsmann_et_al', 'AuthorX': 'Oelsmann_et_al', 'year': '2021', 'title': 'the-zone-of-influence-matching-sea-level-variability-from-coastal-altimetry-and-tide-gauges-for', 'Title': 'The-Zone-Of-Influence-Matching-Sea-Level-Variability-From-Coastal-Altimetry-And-Tide-Gauges-For', 'ID': 'oelsmann_passaro2021'}

It would be better to have generic filters such as { key | filter1 | filter2 ...} such as { author | capitalize | '-'.join } but it needs a few hours to implement and test. Anyway, I could probably add the journal in a simple manner already, if that's something you'd find useful.

boyanpenkov commented 1 month ago

papers 2.4 -- I'll try your branch now...

boyanpenkov commented 1 month ago

OK, super -- on your branch, this issue is no longer observed:

papers add 2013_AdvCIS_Modeling\ and\ simulation\ of\ electrostatically\ gated\ nanochannels.pdf
Traceback (most recent call last):
  File "/home/boyan/miniconda3/envs/python/bin/papers", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/__main__.py", line 1071, in main
    check_install(subp, o, config) and addcmd(subp, o, config)
                                       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/__main__.py", line 452, in addcmd
    biblio.add_pdf(file, attachments=o.attachment, rename=o.rename, copy=o.copy,
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/bib.py", line 427, in add_pdf
    entry['ID'] = self.generate_key(entry)
                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/bib.py", line 367, in generate_key
    key = self.keyformat(entry)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/filename.py", line 108, in __call__
    return self.render(**entry)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/filename.py", line 105, in render
    return stringify_entry(entry, **vars(self))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/boyan/miniconda3/envs/python/lib/python3.11/site-packages/papers/filename.py", line 68, in stringify_entry
    res = template.format(**fields)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'journal'
boyanpenkov commented 1 month ago

I would indeed find "journal" useful, if it's not too much work; please let me know...

boyanpenkov commented 1 month ago

What might be a good way to go it to merge pr-perfect-undo if it's not too work-in-progress, and then I can get up to speed on that, as a whole.

boyanpenkov commented 1 month ago

@perrette By chance, does the config file change between pr-perfect-undo and 2.4?

I see:

(python) → pr-perfect-undo Repos/papers papers --version                                                 11:01:20
WARNING:papers:Legacy config file found: /home/boyan/.local/share/config.json. Delete to remove this warning:  rm -f '/home/boyan/.local/share/config.json'
2.5.dev25+geeb2892

This still does reproduce the error above when I remove the call to "journal" and just try the add:

Traceback (most recent call last):
  File "/home/boyan/miniconda3/envs/python/bin/papers", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/boyan/Vazhno/Work/Repos/papers/papers/__main__.py", line 1195, in main
    check_install(subp, o, config) and addcmd(subp, o, config)
                                       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/boyan/Vazhno/Work/Repos/papers/papers/__main__.py", line 534, in addcmd
    biblio.add_pdf(file, attachments=o.attachment, rename=o.rename, copy=o.copy,
  File "/home/boyan/Vazhno/Work/Repos/papers/papers/bib.py", line 435, in add_pdf
    self.insert_entry(entry, update_key=True, **kw)
  File "/home/boyan/Vazhno/Work/Repos/papers/papers/bib.py", line 288, in insert_entry
    self.insert_entry_check(entry, update_key=update_key, rename=rename, copy=copy, **checkopt)
  File "/home/boyan/Vazhno/Work/Repos/papers/papers/bib.py", line 345, in insert_entry_check
    file = merge_files([candidate, entry], relative_to=self.relative_to)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/boyan/Vazhno/Work/Repos/papers/papers/duplicate.py", line 290, in merge_files
    check = checksum(f) if os.path.exists(f) else None
            ^^^^^^^^^^^
  File "/home/boyan/Vazhno/Work/Repos/papers/papers/utils.py", line 109, in checksum
    return hash_bytestr_iter(file_as_blockiter(open(fname, 'rb')), hashlib.sha256())
                                               ^^^^^^^^^^^^^^^^^
IsADirectoryError: [Errno 21] Is a directory: '/home/boyan/Vazhno/Work/Literature'

Cheers!

perrette commented 1 month ago

It has been a while now, but it may change the default location in relation to local/global install (possibly it also does local by default, I can't remember now). This is a feature branch that grew up bigger than it should have. Should be merged and be done with it. Calling it version 3 I guess. Anyway to remove the warning why not follow the recommendation in the message and remove the config file that triggers the warning?

boyanpenkov commented 1 month ago

Oh, sorry -- yes, I was not sure of the downstream consequences on the install/not-install state there, since I'm still new to that bit...

boyanpenkov commented 1 month ago

Hey @perrette -- I'm reading through https://github.com/perrette/papers/tree/pr-perfect-undo (specifically papers/filename.py). Do I understand correctly that the available fields are defined solely there? If so, I can try to come back with a PR that adds the journal; do let me know...

Cheers!

perrette commented 1 month ago

Yes exactly. The fields are defined in make_template_fields. A PR should includes new user-defined arguments and ideally a few tests. Please go ahead, this might actually be a good one, as I think the underlying code as it is now is relatively clear (despite the caveats discussed earlier in this issue).

boyanpenkov commented 5 days ago

@perrette -- this is extending the conversation in https://github.com/perrette/papers/pull/65

I looked at this, and can confirm the behavior in https://github.com/perrette/papers/tree/pr-perfect-undo I saw:

-- when running papers add thing.pdf (or combinations like papers add ../../thing.pdf)

-- if checking the bibtex yields a duplicate AND that duplicate has a file attribute:

-- then the duplicates' file attribute (even if malformed or wrong or moved) will be used as the merge, which triggers the above bug.

In my case, I had the Pardon2013 paper in my database, with file = {::}, which then pointed papers to the /home/boyan/Vazhno/Work/Literature directory which contains all my PDFs.

So, question -- how corner-case'y is this corner-case?