manubot / rootstock

Clone me to create your Manubot manuscript
https://manubot.github.io/rootstock/
Other
453 stars 178 forks source link

bioRxiv citations are rendered as 'Cold Spring Harbour Laboratory' #376

Open jfy133 opened 4 years ago

jfy133 commented 4 years ago

e.g. 63 of https://apeltzer.github.io/eager2-paper/

63. DamageProfiler: Fast damage pattern calculation for ancient DNA
Judith Neukamm, Alexander Peltzer, Kay Nieselt
Cold Spring Harbor Laboratory (2020-10-01) https://doi.org/ghd45j
DOI: 10.1101/2020.10.01.322206

This happens for both the reference and in-text citation tooltip

agitter commented 4 years ago

Thanks for bringing this up @jfy133. This used to be an issue with all bioRxiv citations, but we thought it was resolved https://github.com/manubot/manubot/issues/16. It may be influenced by the preprint date, where older preprints correctly have bioRxiv as the container-title. It does not seem to be directly affected by the new or older style bioRxiv DOIs because this new style DOI from 2019 works:

$ manubot cite --render doi:10.1101/2019.12.20.884551
1. MGMM: an R package for fitting Gaussian Mixture Models on Incomplete Genomics Data
Zachary R. McCaw, Hanna Julienne, Hugues Aschard
bioRxiv (2019-12-23) https://doi.org/ghf6tr
DOI: 10.1101/2019.12.20.884551

Until we resolve this, one workaround is to cite the affected preprints by URL instead of DOI

$ manubot cite --render https://doi.org/10.1101/2020.10.01.322206
1. DamageProfiler: Fast damage pattern calculation for ancient DNA
Judith Neukamm, Alexander Peltzer, Kay Nieselt
bioRxiv (2020-10-01) https://www.biorxiv.org/content/10.1101/2020.10.01.322206v1
DOI: 10.1101/2020.10.01.322206

Or you can use manual references to correct this if only a small number of citations are affected.

jfy133 commented 4 years ago

Thanks for the tip. The preprint I refer to was released 2 weeks ago so I wonder if there has been another change. But will use the work around as you suggest if needed!

dhimmel commented 4 years ago

Okay followed up with bioRxiv via tweet. This issue has never been fixed at the source, in that bioRxiv doesn't set container-title when depositing Crossref metadata. Therefore, our citation style falls back to showing the publisher, which is "Cold Spring Harbor Laboratory".

agitter commented 3 years ago

@cgreene heard back from one of the bioRxiv developers. The relevant parts of their response:

Concerning the use of the container-title field in the metadata our guidance from Crossref is that this field is not appropriate for the preprint server name. Note, however, that the server name is captured in the institution field, for instance:

<institution>
<institution_name>bioRxiv</institution_name>
</institution>

There isn't a CSL JSON variable for institution, so we can't access that field.

Also the temporary datacite fix discussed in https://github.com/manubot/manubot/issues/16#issuecomment-643271145 is no longer active.

If bioRxiv is following guidance from Crossref, following up with Crossref may be the next step.

dhimmel commented 3 years ago

From the Crossref schema docs

institution: Wrapper element for information about an organization that sponsored or hosted an item but is not the publisher of the item. The institution element should be used to deposit metadata about an organization that sponsored or hosted the research or development of the published material but was not actually the publisher of the information. The institution is distinctly different from the publisher because it may not be a publishing organization. It is typically an organization such as a university, corporation, government agency, NGO or consortia. If the content was published by an organization other than the sponsor, the use of both the publisher and institution elements is encouraged because authors may cite either one in a reference, and the availability of both may allow for more precise matching in queries.

I don't see a container_title field in the Crossref schema, such think this field only gets created upon CSL JSON conversion.

Preprints are considered posted content by Crossref. The schema page has this image for posted_content:

image

Expanding the group_title description:

group_title: Prepublication content items may be organzed into groupings within a given publisher. This element provides for naming the group. It is expected that publishers will have a small number of groups each of which reflect a topic or subject area.

So I think we have the following upstream questions:

  1. To Crossref: does the posted_content data model make it difficult to set the CSL JSON container-title field for preprints? Should container-title always be the institution in the case of posted_content?

  2. To Datacite (whose Crossref to CSL JSON conversion we use): can the conversion be fixed without any changes by Crossref to set institution as container-title for posted_content?

agitter commented 3 years ago

We have some updates on the upstream queries to Crossref.

Via Twitter:

Yes, the schema never distinguished preprint server "title" from "institution" but it could do - we will add this to the list. We'll be reviving the preprints working group soon to reassess the entire workflow - I will let you know when. (-GH)

In response to my support ticket:

Thanks again for this feedback - we are definitely planning to revisit the metadata we collect and distribute for preprints soon, including how we manage the preprint server name, later this year, but are also looking into ways we can address the preprint server name in our outputs in the interim. I'll keep you posted on our progress.