jkitchin / org-ref

org-mode modules for citations, cross-references, bibliographies in org-mode and useful bibtex tools to go with it.
GNU General Public License v3.0
1.36k stars 242 forks source link

Unable to get pdf from Nature #704

Closed uliw closed 4 years ago

uliw commented 4 years ago

Hi John,

I have eth following doi: "10.1038/nature06588". Running (doi-utils-get-pdf-url "10.1038/nature06588") results in nil. The html version exists at https://www.nature.com/articles/nature06588 and the pdf at https://www.nature.com/articles/nature06588.pdf. So this seems straightforward. Is there anything I can do to debug this?

Thanks

Uli

jkitchin commented 4 years ago

I think the issue is the rules matching urls were out dated and only matched http. I pushed a fix to also match https which will probably fix this for you.

uliw commented 4 years ago

Thanks John!

Albeit now the pdf opens in the browser rather than being downloaded. I noticed this before with some journals . Is that something that has to do with my local settings?

Uli

On Thu, Feb 6, 2020 at 10:11 AM John Kitchin notifications@github.com wrote:

I think the issue is the rules matching urls were out dated and only matched http. I pushed a fix to also match https which will probably fix this for you.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jkitchin/org-ref/issues/704?email_source=notifications&email_token=ABWSVAWQZ3N7NCYW5U5BCK3RBQSCJA5CNFSM4KQ6OWNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK7R55A#issuecomment-582950644, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWSVAUQ3A2YFUQICP2BJTLRBQSCJANCNFSM4KQ6OWNA .

-- Ulrich G. Wortmann http://www.es.utoronto.ca/people/faculty/wortmann-ulrich/ http://webcan.es.utoronto.ca/people/faculty/wortmann-ulrich/ Dept. of Earth Sciences Fax : 416 978 3938 University of Toronto Phone: 416 978 7084 22 Russell Street, Toronto, ON, Canada M5S 3B1

jkitchin commented 4 years ago

That probably means that the url returns something that is not obviously a pdf. A lot of journals now only show a pdf inside a frame, and the url returns html. That seems to be the issue here. I am not sure why though. John


Professor John Kitchin Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 @johnkitchin http://kitchingroup.cheme.cmu.edu

On Thu, Feb 6, 2020 at 10:29 AM Ulrich Wortmann notifications@github.com wrote:

Thanks John!

Albeit now the pdf opens in the browser rather than being downloaded. I noticed this before with some journals . Is that something that has to do with my local settings?

Uli

On Thu, Feb 6, 2020 at 10:11 AM John Kitchin notifications@github.com wrote:

I think the issue is the rules matching urls were out dated and only matched http. I pushed a fix to also match https which will probably fix this for you.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/jkitchin/org-ref/issues/704?email_source=notifications&email_token=ABWSVAWQZ3N7NCYW5U5BCK3RBQSCJA5CNFSM4KQ6OWNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK7R55A#issuecomment-582950644 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ABWSVAUQ3A2YFUQICP2BJTLRBQSCJANCNFSM4KQ6OWNA

.

-- Ulrich G. Wortmann http://www.es.utoronto.ca/people/faculty/wortmann-ulrich/ http://webcan.es.utoronto.ca/people/faculty/wortmann-ulrich/ Dept. of Earth Sciences Fax : 416 978 3938 University of Toronto Phone: 416 978 7084 22 Russell Street, Toronto, ON, Canada M5S 3B1

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jkitchin/org-ref/issues/704?email_source=notifications&email_token=AAMJCVRC7TLFXGAI6YGXAO3RBQUHLA5CNFSM4KQ6OWNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK7UDFQ#issuecomment-582959510, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMJCVX7E4FAMAZUWPQSZ43RBQUHLANCNFSM4KQ6OWNA .

uliw commented 4 years ago

ok, I've instrumented nature-pdf-url. The result of the indirect statements returns https://doi.org/10.1038/nature06588. The subsequent replace reg-exp yields https://www.nature.com/articles/nature06588 rather than https://www.nature.com/articles/nature06588.pdf, hence no download. I also looked at the website, the pdf is linked outside of any iframes

On Thu, Feb 6, 2020 at 10:53 AM John Kitchin notifications@github.com wrote:

That probably means that the url returns something that is not obviously a pdf. A lot of journals now only show a pdf inside a frame, and the url returns html. That seems to be the issue here. I am not sure why though. John


Professor John Kitchin Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 @johnkitchin http://kitchingroup.cheme.cmu.edu

On Thu, Feb 6, 2020 at 10:29 AM Ulrich Wortmann notifications@github.com wrote:

Thanks John!

Albeit now the pdf opens in the browser rather than being downloaded. I noticed this before with some journals . Is that something that has to do with my local settings?

Uli

On Thu, Feb 6, 2020 at 10:11 AM John Kitchin notifications@github.com wrote:

I think the issue is the rules matching urls were out dated and only matched http. I pushed a fix to also match https which will probably fix this for you.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/jkitchin/org-ref/issues/704?email_source=notifications&email_token=ABWSVAWQZ3N7NCYW5U5BCK3RBQSCJA5CNFSM4KQ6OWNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK7R55A#issuecomment-582950644

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/ABWSVAUQ3A2YFUQICP2BJTLRBQSCJANCNFSM4KQ6OWNA

.

-- Ulrich G. Wortmann http://www.es.utoronto.ca/people/faculty/wortmann-ulrich/ http://webcan.es.utoronto.ca/people/faculty/wortmann-ulrich/ Dept. of Earth Sciences Fax : 416 978 3938 University of Toronto Phone: 416 978 7084 22 Russell Street, Toronto, ON, Canada M5S 3B1

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/jkitchin/org-ref/issues/704?email_source=notifications&email_token=AAMJCVRC7TLFXGAI6YGXAO3RBQUHLA5CNFSM4KQ6OWNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK7UDFQ#issuecomment-582959510 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAMJCVX7E4FAMAZUWPQSZ43RBQUHLANCNFSM4KQ6OWNA

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jkitchin/org-ref/issues/704?email_source=notifications&email_token=ABWSVAUBA25HAEXWJN5HGHDRBQXABA5CNFSM4KQ6OWNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK7W6CY#issuecomment-582971147, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWSVAWSYOZXSQWYJ72S6FLRBQXABANCNFSM4KQ6OWNA .

-- Ulrich G. Wortmann http://www.es.utoronto.ca/people/faculty/wortmann-ulrich/ http://webcan.es.utoronto.ca/people/faculty/wortmann-ulrich/ Dept. of Earth Sciences Fax : 416 978 3938 University of Toronto Phone: 416 978 7084 22 Russell Street, Toronto, ON, Canada M5S 3B1

jkitchin commented 4 years ago

that means that rule is no longer working right. I have updated to add .pdf, which I believe is correct now. This is the main problem with these rules, publishers change their tactics regularly to prevent things like this from working.

uliw commented 4 years ago

Thanks John!

I think I get the hang of it. The following recipe will work for all the journals of the American Geophysical Union

(defun agu-pdf-url (*doi-utils-redirect*)
  "Get url to the pdf from *DOI-UTILS-REDIRECT*."
  (when (string-match "https://agupubs.onlinelibrary.wiley.com"
*doi-utils-redirect*)
    (replace-regexp-in-string "/full/" "/pdfdirect/" *doi-utils-redirect*)))

This rule also seems to work for other wiley publications and is much simpler than your parsing approach. YMMV though.

Cheers

Uli

On Thu, Feb 6, 2020 at 2:50 PM John Kitchin notifications@github.com wrote:

that means that rule is no longer working right. I have updated to add .pdf, which I believe is correct now. This is the main problem with these rules, publishers change their tactics regularly to prevent things like this from working.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jkitchin/org-ref/issues/704?email_source=notifications&email_token=ABWSVAUFJTC6BNVQJO2NTZ3RBRSX5A5CNFSM4KQ6OWNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELARLUY#issuecomment-583079379, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWSVATWNEWPWJJ5ORZUKMLRBRSX5ANCNFSM4KQ6OWNA .

-- Ulrich G. Wortmann http://www.es.utoronto.ca/people/faculty/wortmann-ulrich/ http://webcan.es.utoronto.ca/people/faculty/wortmann-ulrich/ Dept. of Earth Sciences Fax : 416 978 3938 University of Toronto Phone: 416 978 7084 22 Russell Street, Toronto, ON, Canada M5S 3B1

jkitchin commented 4 years ago

Thanks for the contribution, reporting and debugging help. I added your rule.

uliw commented 4 years ago

your welcome! Thanks for keeping this alive!

On Thu, Feb 6, 2020 at 3:29 PM John Kitchin notifications@github.com wrote:

Thanks for the contribution, reporting and debugging help. I added your rule.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jkitchin/org-ref/issues/704?email_source=notifications&email_token=ABWSVAWW7UAEL53234HYKE3RBRXJNA5CNFSM4KQ6OWNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELAVLVA#issuecomment-583095764, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWSVAXFN2RSKHPFM5T5O3DRBRXJNANCNFSM4KQ6OWNA .

-- Ulrich G. Wortmann http://www.es.utoronto.ca/people/faculty/wortmann-ulrich/ http://webcan.es.utoronto.ca/people/faculty/wortmann-ulrich/ Dept. of Earth Sciences Fax : 416 978 3938 University of Toronto Phone: 416 978 7084 22 Russell Street, Toronto, ON, Canada M5S 3B1