zotero / translation-server

A Node.js-based server to run Zotero translators
Other
123 stars 52 forks source link

Setting up a transation-server for CSL JSON metadata generation? #51

Closed dhimmel closed 5 years ago

dhimmel commented 5 years ago

Greetings, I'm a developer of the Manubot project for writing scholarly papers on GitHub. Our tool supports citation by persistent identifier where users directly write citations into their manuscript source like [doi:10.1098/rsif.2017.0387; @pmid:29424689; @pmcid:PMC5640425; @arxiv:1806.05726]. As such, we're always looking for the most reliable ways to retrieve metadata for various sources of persistent identifiers and convert it into CSL JSON format.

@adam3smith suggested zotero/translation-server to us as per https://github.com/greenelab/manubot/issues/70 and recently @zuphilip mentioned it again in https://github.com/aurimasv/z2csl/issues/19.

Manubot is python package combined with a continuous integration workflow for building and deploying manuscripts. So we're looking for a way to use translation-server inside a Python. With that in mind, I've got the following questions:

  1. Can translation-server produce CSL JSON for a wide variety of citation sources?
  2. Is there a public API endpoint that provides access to translation-server?
  3. If no to 2, is it easy to set up translation-server locally? Is there a Docker image? Does it require secrets that would make every instance have setup overhead?

Thanks ahead of time for your time!

zuphilip commented 5 years ago
  1. All translators from Zotero are also present in translation-server (and you can load additional ones if needed). The search translators support searching for identifiers such as DOI, ISBN, PMID, arXiv ID. There is also a translator for Pubmed Central, but search by PMCID is not yet supported (maybe we should just add this?). One main advantage with a workflow from zotero translation-server is that besides identifiers you can also use URLs, e.g. landing page of newspaper articles.
  2. I am not aware of any public endpoint of the translation-server (besides Citoid which is based on the old framework). However, there is a "public frontend" ZoteroBib https://zbib.org/ .
  3. Yes, it is easy to set up translation-server locally: git clone --recurse-submodules, npm install and npm start. See also the README. A docker image is work in progress, see #24.
dstillman commented 5 years ago

Re: 2, we don't run a public endpoint because translation involves requests to external websites and APIs that may rate-limit or block requests, so it makes sense for individual projects to use their own instances (or have them come from their end users, as is the case with translation from the Zotero client).

dhimmel commented 5 years ago

One main advantage with a workflow from zotero translation-server is that besides identifiers you can also use URLs

Generating metadata for URLs is one of the biggest draws for us. Currently, we use Greycite for this, but it's far from perfect and is closed source with frequent downtime.

Manubot already has good support for citing DOIs, PMIDs, PMCIDs, and arXiv IDs... so really ISBN (better than the outdated Citoid) and URL citation is what we're after. Side note regarding PM/PMC citation, the NCBI just released a Literature Citation Exporter service that could come in handy.

I will explore more to determine whether we should host a Manubot translation-server like zbib does. Or whether the translation-server is lightweight enough that individual users or continuous integration instances could spin one up locally... for this we'd probably want to move to Docker for managing the environment.

dstillman commented 5 years ago

Now that it's based on Node it's quite lightweight. You do have to do the npm i at some point, so you might want to do that ahead of time and then just distribute the installed version.

Currently it also won't update translators automatically. Fixing that is planned, but until then you'd want to pull changes from the translators submodule regularly.

dhimmel commented 5 years ago

Okay I was able to npm install and then npm start and run the following command:

curl --silent \
  --data 'https://www.nytimes.com/2018/06/11/technology/net-neutrality-repeal.html' \
   --header 'Content-Type: text/plain' \
  http://127.0.0.1:1969/web | \
curl --silent \
  --data @- \
  --header 'Content-Type: application/json' \
  'http://127.0.0.1:1969/export?format=bibtex'

This did output bibtex!

@article{collins_net_2018,
    chapter = {Technology},
    title = {Net {Neutrality} {Has} {Officially} {Been} {Repealed}. {Here}’s {How} {That} {Could} {Affect} {You}.},
    issn = {0362-4331},
    url = {https://www.nytimes.com/2018/06/11/technology/net-neutrality-repeal.html},
    abstract = {Net Neutrality rules that required internet service providers to offer equal access to all web content are no longer in effect as of Monday.},
    language = {en-US},
    urldate = {2018-11-16TZ},
    journal = {The New York Times},
    author = {Collins, Keith},
    month = jun,
    year = {2018},
    keywords = {Net Neutrality, Pai, Ajit, Federal Communications Commission, Regulation and Deregulation of Industry, Computers and the Internet}
}

I got an error when requesting csljson (reported in https://github.com/zotero/translation-server/issues/56).

Is there any way to combine these two queries into a single one? I.e. retrieve metadata exported to a specified format in a single query?

dstillman commented 5 years ago

Is there any way to combine these two queries into a single one?

I've created a separate issue for that.