zotero / translators

Zotero Translators
http://www.zotero.org/support/dev/translators
1.28k stars 756 forks source link

JSON-LD export/import #917

Open adam3smith opened 9 years ago

adam3smith commented 9 years ago

and then also include into embedded metadata translator. Suggested by Faolan here: https://forums.zotero.org/discussion/50327/is-unapi-deadbecause-their-website-sure-seems-to-be/#Item_8 seems like a great idea.

dstillman commented 8 years ago

So we'd like to move forward on this, and we'd be happy to sponsor the work on this if someone is interested. The primary goal would be to be able to generate JSON-LD for items via the Zotero API (via the API's support for export translators) and import that back into Zotero clients.

One thing that's not clear to me is the state of the various bibliographic ontologies — bib.schema.org, BIBO, Zotero RDF… There's a JS library that can generate JSON-LD from RDF, so we might be able to piggyback off the existing RDF translators, but as far as I know there hasn't really been much uptake of BIBO, so I don't know what makes the most sense. I'd add that, while we obviously want a round-trip to be close to lossless, I don't think it's a fixed requirement — meaning that I don't think we need to use API JSON (which we don't really want to be an exchange format) or Zotero RDF (same).

Thoughts?

/cc @rmzelle @aurimasv @fcheslack others?

adam3smith commented 8 years ago

That's wonderful. Very much agree this should be a priority.

This has been on things I've been wanting to do for quite a while (and learning more about JSON-LD is going to be quite useful for me), so I'd be interested, but I do have a regular day job now, so I'm happy to leave this to someone with more time.

I always imagined writing this from scratch rather than draw on RDF, in particular because once we specify the RDF to JSON-LD mappings, I'm not sure how much time is saved.

I agree that Bibo hasn't really caught on at all, so I wouldn't feel great on using that. Zotero RDF doesn't even have a written specification. For an ontology, I think schema.org makes most sense. I'm not even sure we need the bib extension, as the regular schema is very thorough for bibliographic data. So JSON-LD with schema.org support would be my way to go. I'm pretty sure that's what we're most likely to find on the import side, too.

If we want a new, rich RDF format, that should almost certainly be BibFrame, but that's very heavily oriented towards linked data, which doesn't jibe well with Zotero's current data model.

Those my not yet fully coherent thoughts. Making sure @zuphilip sees this, as he knows a ton about metadata formats.

zuphilip commented 8 years ago

Nice to hear the support on this. Which vocabulary to use might also depend on our goal: do we want the most widespread vocabulary or do we want the most detailed vocabulary?

I guess that currently schema.org is the way to go if we are looking for mainstream (and I think we are). To answer what is the most used structured web data format, there is the http://webdatacommons.org/ . They told me, that they try to extract JSON-LD in their next run and then there might be also scientific analysis on that. We should consider the bib-extension which add some special cases and fields like for thesis.

On the other hand BIBFRAME is very detailed and discussed as a successor of the MARC format in the library world ("MARC must die"). I have a very preliminary import translator written: https://github.com/UB-Mannheim/zotkat/blob/master/BIBFRAME.js . However, I think now BIBFRAME2 is out. The goal is also to have a large scope, but I read that more as besides libraries also museums or other culture heritage institutions might use BIBFRAME.

dstillman commented 8 years ago

I guess that currently schema.org is the way to go if we are looking for mainstream (and I think we are).

Yes, for export, I think we're looking for mainstream. Nothing stopping us from importing more esoteric ontologies like BIBFRAME (we have a MARC translator, after all), but I don't think that's what we need to export (we don't export MARC). So schema.org seems right for that. What's not clear to me is the level of lossiness we have to be comfortable with if we don't augment schema.org ontologies (the way we do in Zotero RDF).

I always imagined writing this from scratch rather than draw on RDF, in particular because once we specify the RDF to JSON-LD mappings, I'm not sure how much time is saved.

After looking at our existing code, I think I have a bit clearer picture of how this needs to work — apologies if I'm stating the obvious. If our JSON-LD support was only ever going to work with schema.org, a totally separate translator would make sense, but I don't think that's the case here. We may or may not decide to export only schema.org in JSON-LD, but for import it wouldn't make sense to throw away data in all the other ontologies that we already know about. (This functionality will also be a cornerstone of custom type support.)

So I think we're looking at 1) a clean JSON-LD translator with doExport(), using primarily or exclusively schema.org, and 2) JSON-LD support in the Embedded Metadata translator, using our existing RDF functionality. (This is a similar approach to Zotero RDF — we have a Zotero RDF translator with a doExport(), but (if I'm reading things right) Zotero RDF is just imported through RDF.js.)

zuphilip commented 8 years ago

The plan with 1) and 2) looks good for me and I think we might also need to expand the RDF.js translator during 2). For the expressiveness of schema.org etc: OCLC is also using schema.org and you can look at examples there, e.g. https://www.worldcat.org/oclc/920898066#microdatabox (at the very bottom there is collapsed section with microdata). This is in Turtle notation and not JSON-LD.

dstillman commented 8 years ago

and I think we might also need to expand the RDF.js translator during 2)

Yes, exactly. We'd add schema.org support to RDF.js.

adam3smith commented 8 years ago

that sounds great (and yes, you're correct on Zotero RDF import). So I understand the import part right, that would be

  1. Detect the JSON LD
  2. Pass it through a new utility function ZU.jsonldToRDF
  3. Import via RDF.js, which should be enhanced to support schema.org
dstillman commented 8 years ago

Yeah, or just break out HTML/JSON-LD parsing into separate functions within Embedded Metadata.js. (Not sure if JSON-LD parsing would have a use outside of that translator.) Embedded Metadata detectWeb would need to return 'multiple' if more than one result found per page — right now it's only one per page.

Not sure if there are pages with different kinds of embedded metadata for the same item. If so, the translator could do some quick checks to make sure it's not finding redundant results.

adam3smith commented 8 years ago

OK, I'd like to take this if there's no one else. Learning more about the relevant vocabularies will be useful for me any way. I should be able to have a viable version within a month. @dstillman -- do you want to contact me by e-mail about how sponsorship for this should look?

zuphilip commented 8 years ago

I just found a nice written article about how to use JSON-LD: http://blog.codeship.com/json-ld-building-meaningful-data-apis/ . They are using a JavaScript library https://github.com/digitalbazaar/jsonld.js for that (BSD 3-clause license). I guess it is worth to consider this for the issue here.

westurner commented 8 years ago

From "Export to Schema.org RDFa and/or Microdata" https://forums.zotero.org/discussion/35992/export-to-schemaorg-rdfa-andor-microdata/ :

  1. Map from CSL Types and attributes to Schema.org classes and properties
  2. Output RDFa:

(Seems like a lot of work to punctuate triples out of nested JSON form.)

It would be relatively easy to create a JSON-LD context [6] for CSL JSON, but that wouldn't satisfy the output requirements of [CSL Style X] as HTML+RDFa structured data readable by Zotero.

[6] http://www.w3.org/TR/json-ld/#the-context

StoltHD commented 4 years ago

Would there come a feature to export and import to/from json-ld in Zotero Standalone?

IT would be great, because then it can be used to "sync" between Tropy and Zotero