jkitchin / org-ref

org-mode modules for citations, cross-references, bibliographies in org-mode and useful bibtex tools to go with it.
GNU General Public License v3.0
1.37k stars 244 forks source link

Associate existing files pdf with existing entries bibtex entries #692

Closed uliw closed 3 years ago

uliw commented 4 years ago

Hi John,

is there a semi-automatic way to associate existing pdf files with existing BibTeX-entries? I have a large collection of PDFs and decades worth of citations....

Thinking of writing a python script, but I don't want to duplicate a possibly already existing solution. The general idea is to extract the doi from existing pdfs via pdfx and then parse the bibtex file for entries with a doi. If both match, rename the pdf according to the bibkey. Or is there an additional step on org-ref's side to create the association?

Let me know what you think

Uli

jkitchin commented 4 years ago

check out org-ref-pdf-to-bibtex in org-ref-pdf and org-ref-pdf-dir-to-bibtex. They will convert a pdf to a bibtex entry. That should be one or two steps away from what you want.

I don't use them alot, and what you want to do is pretty challenging. There are always many corner cases of multiple DOIs, no DOI, etc.

John


Professor John Kitchin Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 @johnkitchin http://kitchingroup.cheme.cmu.edu

On Sat, Nov 30, 2019 at 3:23 PM Ulrich Wortmann notifications@github.com wrote:

Hi John,

is there a semi-automatic way to associate existing pdf files with existing BibTeX-entries? I have a large collection of PDFs and decades worth of citations....

Thinking of writing a python script, but I don't want to duplicate a possibly already existing solution. The general idea is to extract the doi from existing pdfs via pdfx and then parse the bibtex file for entries with a doi. If both match, rename the pdf according to the bibkey. Or is there an additional step on org-ref's side to create the association?

Let me know what you think

Uli

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jkitchin/org-ref/issues/692?email_source=notifications&email_token=AAMJCVQ2P4W34LE4ZASG4TLQWLDTBA5CNFSM4JTIGE2KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H5AW2TA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMJCVUB36XR22KTSJROBE3QWLDTBANCNFSM4JTIGE2A .

uliw commented 4 years ago

hmmh, for me, org-ref-pdf-to-bibtex fails most of the time, typically with a sequence like this:

Contacting host: dx.doi.org:80 uncompressing publicsuffix.txt.gz...done bibtex-print-help-message: Wrong type argument: stringp, nil Contacting host: dx.doi.org:80 Title of the article (BibTeX converts it to lowercase) Journal "Computers \& Geosciences" not found in org-ref-bibtex-journal-abbreviations. Saving file /home/uliw/user/literatur/new.bib... Wrote /home/uliw/user/literatur/new.bib f-exists?: Wrong type argument: stringp, nil

but more importantly, I cannot change my existing keys without breaking 20 years of tex documents (and the key scheme has changed over time).

The idea was not to provide a 100% solution, but something which would map keys and pdf in most cases, and which could be helpful when the doi-utils-add -bibtex entry from doi created a bibtex entry but is unable to download a pdf.

Cheers

Uli

On Sat, Nov 30, 2019 at 3:27 PM John Kitchin notifications@github.com wrote:

check out org-ref-pdf-to-bibtex in org-ref-pdf and org-ref-pdf-dir-to-bibtex. They will convert a pdf to a bibtex entry. That should be one or two steps away from what you want.

I don't use them alot, and what you want to do is pretty challenging. There are always many corner cases of multiple DOIs, no DOI, etc.

John


Professor John Kitchin Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 @johnkitchin http://kitchingroup.cheme.cmu.edu

On Sat, Nov 30, 2019 at 3:23 PM Ulrich Wortmann notifications@github.com wrote:

Hi John,

is there a semi-automatic way to associate existing pdf files with existing BibTeX-entries? I have a large collection of PDFs and decades worth of citations....

Thinking of writing a python script, but I don't want to duplicate a possibly already existing solution. The general idea is to extract the doi from existing pdfs via pdfx and then parse the bibtex file for entries with a doi. If both match, rename the pdf according to the bibkey. Or is there an additional step on org-ref's side to create the association?

Let me know what you think

Uli

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/jkitchin/org-ref/issues/692?email_source=notifications&email_token=AAMJCVQ2P4W34LE4ZASG4TLQWLDTBA5CNFSM4JTIGE2KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H5AW2TA , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAMJCVUB36XR22KTSJROBE3QWLDTBANCNFSM4JTIGE2A

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jkitchin/org-ref/issues/692?email_source=notifications&email_token=ABWSVASIZEMBAASDQJ23EN3QWLECDA5CNFSM4JTIGE2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFQTHXA#issuecomment-560018396, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWSVAT3WSRC7AMBZHR2AB3QWLECDANCNFSM4JTIGE2A .

-- Ulrich G. Wortmann http://www.es.utoronto.ca/people/faculty/wortmann-ulrich/ http://webcan.es.utoronto.ca/people/faculty/wortmann-ulrich/ Dept. of Earth Sciences Fax : 416 978 3938 University of Toronto Phone: 416 978 7084 22 Russell Street, Toronto, ON, Canada M5S 3B1

jkitchin commented 4 years ago

Something is weird with those errors. bibtex-print-help-message: Wrong type argument: stringp, nil and f-exists?: Wrong type argument: stringp, nil are probably not coming from org-ref (I think).

I sort of meant for these to be seeds for how to achieve what you want, not that these would do exactly what you need. It is almost impossible to do that for everyone! I have found it pretty difficult to automate this reliably in the past, so these function are even just reminders for me of how to do some things.

uliw commented 4 years ago

Hi John,

I've played around with this a bit with pretty disappointing results. You are right, that is a lot more complex than I thought! I will see whether I can adapt my workflow to the existing tools. As for those error messages, I there a good way to debug this? I can look into this once term is over.

Cheers

Uli

On Wed, Dec 4, 2019 at 8:57 PM John Kitchin notifications@github.com wrote:

Something is weird with those errors. bibtex-print-help-message: Wrong type argument: stringp, nil and f-exists?: Wrong type argument: stringp, nil are probably not coming from org-ref (I think).

I sort of meant for these to be seeds for how to achieve what you want, not that these would do exactly what you need. It is almost impossible to do that for everyone! I have found it pretty difficult to automate this reliably in the past, so these function are even just reminders for me of how to do some things.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jkitchin/org-ref/issues/692?email_source=notifications&email_token=ABWSVAUS72GOLN52OQB627DQXBNXPA5CNFSM4JTIGE2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF7GFQQ#issuecomment-561930946, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWSVATJHKFFIA2IG3H6XNLQXBNXPANCNFSM4JTIGE2A .

-- Ulrich G. Wortmann http://www.es.utoronto.ca/people/faculty/wortmann-ulrich/ http://webcan.es.utoronto.ca/people/faculty/wortmann-ulrich/ Dept. of Earth Sciences Fax : 416 978 3938 University of Toronto Phone: 416 978 7084 22 Russell Street, Toronto, ON, Canada M5S 3B1

jkitchin commented 4 years ago

I would start by using edebug on org-ref-pdf-to-bibtex and step through to see which functions are giving those messages. Maybe they are in a hook, or getting called outside of org-ref.

quickfold commented 4 years ago

@uliw This isn't the solution you are pursuing, but it would probably work faster: use Zotero + the Better BibTex for Zotero add-on. Put all pdfs into Zotero, have it automatically find and download reference information, have BBTex assign citekeys, and export the whole library in bibtex format.

uliw commented 4 years ago

Thanks! I did not know about the Better Bib tex addon. This is helpful

Uli

On Thu, May 14, 2020 at 11:25 PM quickfold notifications@github.com wrote:

@uliw https://github.com/uliw This isn't the solution you are pursuing, but it would probably work faster: use Zotero + the Better BibTex for Zotero add-on. Put all pdfs into Zotero, have it automatically find and download reference information, have BBTex assign citekeys, and export the whole library in bibtex format.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jkitchin/org-ref/issues/692#issuecomment-629004663, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWSVATXEIDDOP7Z73B6VMTRRSY2ZANCNFSM4JTIGE2A .

-- Ulrich G. Wortmann https://www.es.utoronto.ca/people/faculty/wortmann-ulrich/ Dept. of Earth Sciences Fax : 416 978 3938 University of Toronto Phone: 416 978 7084 22 Russell Street, Toronto, ON, Canada M5S 3B1