umbreak / jsonld-benchmarks

Json-LD benchmarks for Java implementations: Json-LD Java vs Titanium
Apache License 2.0
4 stars 1 forks source link

include real-world JSON-LD, also Framing examples #3

Open VladimirAlexiev opened 1 year ago

VladimirAlexiev commented 1 year ago

@hmottestad in https://github.com/filip26/titanium-json-ld/pull/272 (cc @filip26)

  • I've been using https://github.com/umbreak/jsonld-benchmarks a bit and managed to make it a somewhat faster. I don't feel that it's a very representative benchmark since it's based on the W3C compliance suite. I don't really want to optimize for parsing "a/b/c/d/e/../../../../../../../f" URL path fragments.
  • What I really need are some real world examples so I can optimize for how users would actually be using Titanium JSON-LD.
  • At the moment I also don't have any real world examples of JSON-LD Framing. I remember talking to someone a few years ago who was using framing to convert JSON-LD into "regular" json to make the frontend devs happy. I think that is probably a typical use case for framing. Do you know if Ontotext has any real-world examples of framing?

Here's what I know:

Also asked at https://twitter.com/valexiev1/status/1691084004894593024

fils commented 1 year ago

@VladimirAlexiev I can get you quite a few (a few 100k) for Dataset, Person, Organization and a few more schema.org type.

We (https://github.com/gleanerio/ ) work with several communities that are publishing JSON-LD documents. One of these is the UNESCO Ocean Info Hub you can find information about this work at https://github.com/iodepo/odis-arch with the sources publishing JSON-LD at https://github.com/iodepo/odis-arch/tree/master/collection/config

We use the tooling at https://github.com/gleanerio/ to harvest these JSON-LD documents into a S3 object stores like Minio or many others. We use Gleaner and Nabu from GleanerIO to do this in production, but we've been working on https://github.com/gleanerio/archetype as a tool for people to index with (https://github.com/gleanerio/archetype/blob/master/docs/quickstart.md). This is a bit of a testing tool/repo I have been trying to shape up.

We have similar sources for the earth science community at https://github.com/earthcube

The search interfaces for these two are at https://oceaninfohub.org/ and https://geocodes.earthcube.org/#/landing

When you search there, you can also get to the JSON-LD that the partners have published.

If you want JSON-LD that is not always pretty, this is surely it. We provide guidance on publishing, but that get interpreted in many ways so a lot of variation all over the place.

Again, we don't expose the JSON-LD, but it is easy to use the tools with the sitemaps to get them. The archetype repo could use some love, but if you wanted to use it to harvest the sources in the config files I'd be happy to address any suggestions or needs.

I can get you more from other communities we work with as well. For OIH, we also have these compiled into downloadable release graphs, ref: https://github.com/iodepo/odis-arch/tree/master/graphOps/releaseGraphs (this is a work in progress). Those are not JSON-LD but those JSON-LD documents converted and combined into NQ via Nabu.

umbreak commented 1 year ago

If you find a more real life dataset and you'd like to add that to this PR and include the results, I'm happy to merge that. Thanks for working on this