xlcnd / isbnlib

python library to validate, clean, transform and get metadata of ISBN strings (for devs).
Other
220 stars 29 forks source link

Support for outputting CSL JSON formatted metadata #48

Closed dhimmel closed 6 years ago

dhimmel commented 6 years ago

It would be nice to have a bibformatter to export ISBN metadata to Citation Styles Language (CSL) JSON. This would help us add support for ISBN citations in the Manubot: see https://github.com/greenelab/manubot/issues/14.

I'm envisioning being able to do the following:

import isbnlib
isbn = '9780262035613'
metadata = isbnlib.meta(isbn, cache=None)
csl = isbnlib.registry.bibformatters['csl'](metadata)

csl would presumably a dict or collections.OrderedDict. Alternatively, it could be already dumped as a JSON string (although I think that's less preferable).

CSL JSON is a way of storing bibliographic metadata that is a successor to formats like bibtex. It's used commonly in scholarly publishing. The documentation isn't great, but here's a schema definition. Here's also some written doc.

I'm happy to help as needed. Especially I can help convert the output of isbnlib.meta to CSL JSON. Is there documentation of all the possible keys returned in the output of isbnlib.meta?

xlcnd commented 6 years ago

Thank you for your suggestion.

Answering to your question:

isbnlib.metadata returns a dictionary with keys ('ISBN-13', 'Title', 'Authors', 'Publisher', 'Year', 'Language') and values as strings (a list of strings for the 'Authors').

These are the common fields to all providers and are fixed in the library. Even then, 'Language' is NOT used with the builtin 'bibformatters' because for bibliographic citations 'Language' is the language in wich the book is written, but that is NOT the meaning of 'Language' in ISBN regestries (is usually the main language of the publisher's country)!

I will take a look at this CSL format and see if it make sense to install it in the core library as a new block in isbnlib/dev/_fmt.py(probably yes if it is widely used) or as an add-in.

But please, you are free to have a go!

xlcnd commented 6 years ago

From a rush consultation to csl-json, it seems that in order to implemente a formatting in CSL is only necessary to create a new template in isbnlib\dev\_fmt.py like:

csl = r'''{"type":"book", "id":"$ISBN", "title":"$Title", "issued": {"raw": "$Year"}, "ISBN":"$ISBN", "publisher":"$Publisher", "author": [$AUTHORS]}'''

with pos-processing for $AUTHORS

elif name == 'csl': AUTHORS = ', '.join('{"literal": "$"}'.replace("$", a) for a in authors) Is this a correct CSL-JSON data fragment and is enough?

dhimmel commented 6 years ago

Agree with the general strategy. A few points / questions:

xlcnd commented 6 years ago

Here is my reply point-by-point:

But maybe it is not a good idea to implement this in the core of isbnlib, but do a plug-in because:

xlcnd commented 6 years ago

Anyway, I have already implemented a 'simple' version to support 'CSL-JSON'! It produces things like this:

{"type":"book",
        "id":"9780321534965",
     "title":"The Art Of Computer Programming",
    "author": [{"literal": "Donald Ervin Knuth"}],
    "issued": {"date_parts": [["2008"]]},
      "ISBN":"9780321534965",
 "publisher":"Addison-Wesley"}

Is this a valid CSL document? Is this useful?

dhimmel commented 6 years ago

@xlcnd that would be useful. If you open a PR, I'd be happy to review. The only issue that I see presently is that 2008 should not be quoted. It should be an int.

xlcnd commented 6 years ago

Its already in the dev branch. Year is now an int.