metachris / pdfx

Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.
http://www.metachris.com/pdfx
Apache License 2.0
1.03k stars 113 forks source link

DOI traversal / CrossRef API #2

Open metachris opened 8 years ago

metachris commented 8 years ago

"The standard way for getting the actual PDF from a DOI, when it's a Crossref DOI (which it probably is) is to use the full-text link, available in the CrossRef API. For DOI 10.1155/2010/963926 http://api.crossref.org/works/10.1155/2010/963926 From the returned JSON message -> link -> there's the PDF!"

[
  {
    intended-application: "text-mining",
    content-version: "vor",
    content-type: "application/pdf",
    URL: "http://downloads.hindawi.com/journals/jo/2010/963926.pdf"
  },
  {
    intended-application: "text-mining",
    content-version: "vor",
    content-type: "application/xml",
    URL: "http://downloads.hindawi.com/journals/jo/2010/963926.xml"
  }
]

via HN: https://news.ycombinator.com/item?id=10452048

metachris commented 8 years ago

Related: http://www.doi.org/