google / schemarama

Schemarama is a project exploring standards-based validation for structured data, especially Schema.org.
Apache License 2.0
124 stars 22 forks source link

modularize extraction queries #28

Closed VladimirAlexiev closed 2 years ago

VladimirAlexiev commented 3 years ago

A brief inspection of https://github.com/google/schemarama/tree/main/kgx/wikidata shows that those queries have mostly the same structure:

So the distinct part of each query selects props, and optionally maps them to bioschema. I think it makes sense to extract these specific parts and then generate SPARQL from them. This will help to:

BTW, have you considered generating extraction queries from WD SHEX like https://www.wikidata.org/wiki/EntitySchema:E258 ?

danbri commented 2 years ago

Hi! Yes - these queries were part of some collaboration around Wikidata's Subsetting-oriented community. Others were indeed looking at ShEx to SPARQL generators. I agree that SPARQL ultimately should be closer to being a compile target than the "source of truth" format, for these usecases. I'll close your issue as we're not actively working on this at the moment and when it comes back I'll make a dedicated repository, but you make good points!