dhimmel commented 5 years ago

Currently, citing a Manubot URL returns JSON CSL like

    "type": "webpage",
    "title": "Manubot Rootstock: Manuscript Title",
    "URL": "",
    "shortTitle": "Manubot Rootstock",
    "language": "en-US",
    "author": [
        "family": "Doe",
        "given": "John"
        "family": "Roe",
        "given": "Jane"
    "issued": {
      "date-parts": [
    "accessed": {
      "date-parts": [
    "id": "s7XRFgWm"

Command to create:

manubot cite url:

With the current manubot cite command, this metadata is retrieved from Zotero's translation-server. Pandoc encodes most of the information picked up by translation-server. One field that does not get set in the CSL JSON is container-title. For example, we could set container-title to equal "Manubot Preprint" by setting a <meta> field in the HTML head.

We'd probably want to use Pandoc's --include-in-header to insert these <meta> statements. Are there other fields besides CSL JSON's container-title we want to set?

slochower commented 5 years ago

What is the purpose of having "Manubot Preprint" as the container?

agitter commented 5 years ago

@slochower one purpose would be for Manubot manuscripts citing other Manubot manuscripts. For example, in this manuscript the reference looks like:

image "Manubot Preprint" may signal to a reader this is an HTML manuscript as opposed to a different type of web citation.

@dhimmel I support adding the container-title. I can't think of any other appropriate fields. Volume, issue, page numbers, etc. don't apply.

The issue title implies this would only affect the HTML version of the manuscript, right?

slochower commented 5 years ago

What other things go into container title? For example, does “bioRxiv preprint” ever appear in container title?

dhimmel commented 5 years ago

Here's the output of manubot cite doi:10.1101/515643 (a bioRxiv preprint):

```json [ { "publisher": "Cold Spring Harbor Laboratory", "abstract": "Researchers in the life sciences are posting their work to preprint servers at an unprecedented and increasing rate, sharing papers online before (or instead of) publication in peer-reviewed journals. Though the popularity and practical benefits of preprints are driving policy changes at journals and funding organizations, there is little bibliometric data available to measure trends in their usage. Here, we collected and analyzed data on all 37,648 preprints that were uploaded to, the largest biology-focused preprint server, in its first five years. We find that preprints on bioRxiv are being read more than ever before (1.1 million downloads in October 2018 alone) and that the rate of preprints being posted has increased to a recent high of more than 2,100 per month. We also find that two-thirds of bioRxiv preprints posted in 2016 or earlier were later published in peer-reviewed journals, and that the majority of published preprints appeared in a journal less than six months after being posted. We evaluate which journals have published the most preprints, and find that preprints with more downloads are likely to be published in journals with a higher impact factor. Lastly, we developed, a website for downloading and interacting programmatically with indexed metadata on bioRxiv preprints.", "DOI": "10.1101/515643", "type": "manuscript", "source": "Crossref", "title": "Tracking the popularity and outcomes of all bioRxiv preprints", "author": [ { "given": "Richard J.", "family": "Abdill" }, { "given": "Ran", "family": "Blekhman" } ], "issued": { "date-parts": [ [ 2019, 1, 13 ] ] }, "URL": "", "id": "IYwQbTVz" } ] ```

Note that container-title is not set, although we think this is probably a bioRxiv bug, see

The CSL docs define container-title as:

title of the container holding the item (e.g. the book title for a book chapter, the journal title for a journal article)

However, it's important to note that we're not directly setting container-title. Instead, we are setting metadata fields that will get picked up by Zotero and populate certain Zotero metadata fields that will then get exported as container-title in CSL.

Perhaps instead or in addition, we want the CSL publisher field to be set to "Manubot"?

agitter commented 5 years ago

I'm not sure that I think of Manubot as a publisher. The journal is closer to my interpretation of what the "Manubot Preprint" should be.

The comparable meta field in a bioRxiv preprint is: <meta name="citation_journal_title" content="bioRxiv" />

This conversation prompted an idea for a workaround for the bug in I'll post it there to keep this discussion focused.

dhimmel commented 5 years ago

My only worry is whether all Manubot documents are "preprints". The user can always change the value if not, perhaps to "Manubot Document" or just "Manubot".

I do think Manubot is sort of the publisher. Perhaps "GitHub Pages" is the publisher or the source manuscript's GitHub account. We don't necessarily have to set metadata.

Currently, here is how the Meta Review shows up on Google Scholar:


The bibtex from Google Scholar is:

  title={Open collaborative writing with Manubot},
  author={Himmelstein, Daniel S and Slochower, David R and Malladi, Venkat S and Greene, Casey S and Gitter, Anthony}

We can also look into getting the publication date set.

agitter commented 5 years ago

You're right that not all Manubot documents are preprints. "Manubot", "Manubot Document", or "Manubot Manuscript" (though not everything is a manuscript either) would be better.

👍 on setting the publication date as well.

slochower commented 5 years ago

I guess I'm confused by the analogy of "Manubot" as a container. I don't really think "Manubot" or even "Manubot Document" performs the same role as a book or a journal. In those cases, the container acts like an index, where you can find similar or related things. I don't have strong feelings on this, though.

dhimmel commented 5 years ago

So bioRxiv is setting both Dublin Core and Google Scholar meta tags.



So there is a lot more to set. However, we should push for as much of this to be done by Pandoc. For example, should Pandoc set rather than Actually it looks like the DCQ docs allow both:

<meta name="DC.element" content="Value" />
<meta name="DCTERMS.element" content="Value" />


The fifteen element "Dublin Core" described in this standard is part of a larger set of metadata vocabularies and technical specifications maintained by the Dublin Core Metadata Initiative (DCMI). The full set of vocabularies, DCMI Metadata Terms [DCMI-TERMS]