relaton / relaton-doi

Relaton-DOI: retrieve bibliographic items using DOI
MIT License
0 stars 0 forks source link

Bibtex export needs to be aligned to XML output #12

Closed opoudjis closed 1 year ago

opoudjis commented 1 year ago

I am writing a blog post advertising relaton-doi for metanorma.org. In order to promote relaton-doi, we need to make it usable at least in bibtex. (I would mention citeproc, but apparently we don't support it).

The example DOI I use is relaton fetch doi:10.1515/9783110889406.257 -f bibtex

The good news is it works (it was crashing in the released version, on parsing the date).

The bad news is that it is leaving out much of the value add that the XML has:

@inbook{heller-a,
  title = {Gender and public space in a bilingual school},
  author = {Heller, Monica},
  publisher = {DE GRUYTER MOUTON},
  address = {},
  series = {Multilingualism, Second Language Learning, and Gender},
  timestamp = {2022-12-28}
}

The editors of the collected volume are absent, as are the coauthors, and any URIs.

If the issue here is that the BibTex export of relaton is only partial, then we need to beef it up to be complete. We cannot publicise relaton in a blog article if the only format the outside world knows about is Bibtex, and our bibtex mapping is this incomplete.

Request that you add bibtex output to all fixtures as well; this is the first time we're committing to outputting across all bibtypes in a gem, which is why we don't have great coverage.

Bibtex documentation online is not great, but taking https://www.bibtex.com/e/entry-types/ as a starting point:

* article
  * author = /contributor[@role/type = 'author'] # all combined: 2 authors as X and Y, three authors as X, Y and Z. 
  * title = /title
  * journal = /series/name
  * year = /date[@type = 'published']
  * volume = /extent//locality[@type = 'volume']
  * number = /extent//locality[@type = 'issue']
  * pages = /extent//locality[@type = 'page'] # en-dash realised as `--`, e.g. 12--13

* book, proceedings
  * author = /contributor[@role/type = 'author']
  * editor = /contributor[@role/type = 'editor']
  * title = /title
  * series = /series/name
  * publisher =  /contributor[@role/type = 'publisher']
  * year = /date[@type = 'published']
  * address = /place/city + place/region
  * volume = /extent//locality[@type = 'volume']
  * edition = /edition

* inbook, incollection
  * author = /contributor[@role/type = 'author']
  * title = /title
  * editor = /relaton[@type = 'includedIn']/bibitem/contributor[@role/type = 'editor']
  * booktitle = /relaton[@type = 'includedIn']/bibitem/title
  * series = /series/name
  * publisher =  /contributor[@role/type = 'publisher']
  * year = /date[@type = 'published']
  * address = /place/city + place/region
  * volume = /extent//locality[@type = 'volume']
  * chapter = /extent//locality[@type = 'chapter']
  * pages = /extent//locality[@type = 'page'] # en-dash realised as `--`, e.g. 12--13
  * edition = /edition

* inproceedings
  * author = /contributor[@role/type = 'author']
  * title = /title
  * editor = /relaton[@type = 'includedIn']/bibitem/contributor[@role/type = 'editor']
  * booktitle = /relaton[@type = 'includedIn']/bibitem/title
  * series = /series/name
  * publisher =  /contributor[@role/type = 'publisher']
  * year = /date[@type = 'published']
  * address = /place/city + place/region
  * pages = /extent//locality[@type = 'page'] # en-dash realised as `--`, e.g. 12--13
  * organization = /contributor[@role/type = 'enabler']

* manual: as above

* mastersthesis, phdthesis (just use phdthesis by default)
  * author = /contributor[@role/type = 'author']
  * title = /title
  * school =  /contributor[@role/type = 'authorizer'] or /contributor[@role/type = 'publisher'] 
  * year = /date[@type = 'published'] or   /date[@type = 'issued'] or /date[@type = 'created']
  * address = /place/city + place/region

* techreport (includes standard), manual 
  * author = /contributor[@role/type = 'author']
  * title = /title
  * institution =  /contributor[@role/type = 'authorizer'] or /contributor[@role/type = 'publisher'] 
  * year = /date[@type = 'published'] or   /date[@type = 'issued'] or /date[@type = 'created']
  * address = /place/city + place/region
  * number = /docidentifier
  * edition = /edition

BibTeX must include not just the first author or editor, but all of them, in a single prerendered string; e.g. "Lisa A. Urry and Michael L. Cain and Steven A. Wasserman and Peter V. Minorsky and Jane B. Reece"

If we want cleverer rendering of authors, with i18n and templates (e.g. "Susskind, L. & G. Hrabovsky"), you can use relaton-render, but we don't need it yet.

doi, isbn, and issn are non-standard bibtex fields. For now, do not use them.

url is a non-standard bibtex field, but from https://www.bibtex.com/f/url-field/ , it does seem widely used. I suggest we do use it. The standard-compatible way apparently is howpublished = "\url{https://www.nytimes.com/what-we-learned-2018}", but I'm quite happy for us not to use it.

FWIW, Crossref uses both:

doi = {10.1515/9783110889406.257},
    url = {https://doi.org/10.1515%2F9783110889406.257},
    publisher = {{DE} {GRUYTER} {MOUTON}},
    author = {Monica Heller},
    title = {Gender and public space in a bilingual school},
    booktitle = {Multilingualism, Second Language Learning, and Gender}
andrew2net commented 1 year ago

@opoudjis relaton renders journal = /series[@type="journal"]/name and series = /series[not(@type)]/name. Should we change the behavior? In the Relaton model, the series attribute is an array. The BibTex format allows only one journal value. Should we use only the first series?

andrew2net commented 1 year ago

url = {https://doi.org/10.1515%2F9783110889406.257},

@opoudjis the bibtex has a doi field https://www.bibtex.com/f/doi-field/, shouldn't we use it for DOI type links instead of url?

andrew2net commented 1 year ago

@opoudjis I've updated the BibTex renderer in the relaton-bib. Please install the GitHub version of the relaton-bib and try the BibTex output.

opoudjis commented 1 year ago

@opoudjis relaton renders journal = /series[@type="journal"]/name and series = /series[not(@type)]/name. Should we change the behavior?

.... No, if I am understanding you.

In the Relaton model, the series attribute is an array. The BibTex format allows only one journal value. Should we use only the first series?

Yes.

opoudjis commented 1 year ago

url = {https://doi.org/10.1515%2F9783110889406.257},

@opoudjis the bibtex has a doi field https://www.bibtex.com/f/doi-field/, shouldn't we use it for DOI type links instead of url?

Well, we should not use DOI links in url, because they are not real URLs. OTOH, doi should be actual DOIs (the suffix of URL, not URL-encoded), not the URL.

opoudjis commented 1 year ago

In the spec/examples/*.bib files, I notice:

  month = apr,

Surely that's wrong, and it should be month = {apr},

opoudjis commented 1 year ago

relaton fetch doi:10.1515/9783110889406.257 -f bibtex is now working better, but:

<uri>https://www.degruyter.com/document/doi/10.1515/9783110889406.257/html</uri>

is missing from the BibTeX record:

@inbook{heller-a,
  title = {Gender and public space in a bilingual school},
  author = {Heller, Monica},
  editor = {Pavlenko, Aneta and Blackledge, Adrian and Piller, Ingrid and Teutsch-Dwyer, Marya},
  booktitle = {Multilingualism, Second Language Learning, and Gender},
  publisher = {DE GRUYTER MOUTON},
  address = {Berlin},
  timestamp = {2023-01-15},
  doi = {http://dx.doi.org/10.1515/9783110889406.257}
}
opoudjis commented 1 year ago

bundle exec relaton fetch doi:10.5962/bhl.title.124254 -f bibtex

This almost cleans up the record, as described in the blog post, but:

@book{kuster1852a,
  title = {Die Gattungen Pupa, Megaspira, Balea und Tornatellina : in Abbildungen nach der Natur mit Beschreibungen},
  author = {Kuster, H. C. and Chemnitz, Johann Hieronymus and Martini, Friedrich Heinrich Wilhelm},
  publisher = {Verlag von Bauer und Raspe (Julius Merz)},
  year = {1852},
  address = {Nürnberg, },
  timestamp = {2023-01-15},
  doi = {http://dx.doi.org/10.5962/bhl.title.124254}
}

The address should be deleting the trailing comma: address = {Nürnberg},

andrew2net commented 1 year ago

@opoudjis updated relaton-bib and relaton-doi, please try the latest commits from GitHub.

opoudjis commented 1 year ago

I am happy with these fixes, thank you!