larsgw / citation.js

Citation.js converts formats like BibTeX, Wikidata JSON and ContentMine JSON to CSL-JSON to convert to other formats like APA, Vancouver and back to BibTeX.
https://citation.js.org/
MIT License
222 stars 30 forks source link

Cache citation data in a file #164

Open nichtich opened 5 years ago

nichtich commented 5 years ago

I though about using citation-js and Wikidata for reference management from command line. Given a list of Wikidata ids citation-js can lookup and convert to CSL-JSON, e.g.:

echo Q163335 > citekeys
echo Q3290152 >> citekeys
citation-js -o references -i citekeys

When added another key I don't want citation-js to download known items again. This can be done with some command line magic:

echo Q3020388 >> citekeys
{ cat citekeys & jq -r .[].id references.json; } | sort | uniq -u | citation-js >> references.json

A missing step is needed to combine the list of JSON array in references.json (or implement https://github.com/larsgw/citation.js/issues/163):

jq -s '[.[][]]' references.json > tmp; mv tmp references.json

Would it make sense to include this functionality in citation-js?:

citation --cache references.json < citekeys
larsgw commented 5 years ago

Merging two parsed CSL-JSON files is fine (I'd propose using the original input, optionally saved in _graph, for diffing), I think. Merging two inputs where one or both are unparsed, without parsing, is less so, in a general case anyway. Implementing a special case for URLs or Wikidata IDs is possible, but goes against my ideas of trying to make things modular. However, perhaps the CLI should be exempt from that modularization...

nichtich commented 5 years ago

I'm only interested in CSL-JSON because that's all needed to create citations and bibliographies. If settings have changed and the original input is needed, one should better rebuild the full cache. Items should be identified by their citation key which is in the id field for items converted from Wikidata. I have not tried other importers but heuristics to merge same publications from different sources should be out of the scope of citation-js. Getting the same record from Wikidata via QID and from crossref via DOI would be two records in the cache.

larsgw commented 5 years ago

Getting the same record from Wikidata via QID and from crossref via DOI would be two records in the cache.

Agreed.

Items should be identified by their citation key which is in the id field for items converted from Wikidata.

Thing is, not every item with the same id has to have the same origin, and not every item with the same origin has to have the same ID. Sure, the Wikidata ID is in the id field now, but that might change, or someone might have a BibTeX file, maybe even exported from Citation.js, with a Wikidata ID in the label field. Because the original input is already saved in _graph (or should be, see #165), that seems like a better way to distinguish. That's what I meant, anyway.

I'm only interested in CSL-JSON because that's all needed to create citations and bibliographies.

But then the CLI magic would still be needed, because otherwise Citation.js would have to parse the entire citekeys file again, right? I'll work on #163 too, btw.