linked-statistics / eurostat

Back-end code and website for eurostat.linked-statistics.org
7 stars 2 forks source link

Use SDMX-ML as data source #2

Open csarven opened 10 years ago

csarven commented 10 years ago

Current mapping is TSV->RDF. SDMX-ML->RDF should be used instead. Brief rationale for the switch (from emails):

Aftab: "What is the added value of moving from TSV->RDF to SDMX-ML->RDF assuming that the resulting triples and properties remain the same?"

Sarven: "There is a more precise mapping, with less assumptions from SDMX-ML to RDF using QB, then there is via TSV. With the TSV approach, we hard-code a lot of (good) assumptions about what to do with the fields names, and cell values. Since QB is historically based on the SDMX information model, and that the vocabulary terms are available in SDMX-RDF, it is a simpler way forward. SDMX-ML is also considered to be the source format of these agencies, where they later generate other formats (e.g., TSV) - AFAIK! The resulting triples at this time are not the same.

Going forward, effort that's put into maintaining TSV->RDF is only good for Eurostat in EU-data-cloud. What we are trying to accomplish with Linked SDMX is to have a "one transformation to rule all statistical data". That's in contrast to writing a custom mapping for each CSV/TSV we encounter. At least that's the general direction."

Richard: "In the long run, I’d certainly prefer to see eurostat.linked-statistics.org use Linked SDMX, because that would move us from a one-off solution to something re-usable that has a better chance of being maintained in the long term."

csarven commented 10 years ago

Some expectations from Linked SDMX:

cygri commented 10 years ago

A good first step would be to adapt the scripts so that they produce RDF data using both the TSV→RDF and SDMX→RDF approaches in parallel. Then we can alert users, adapt example queries where needed, and so on, before completely switching over.