schemaorg / suggestions-questions-brainstorming

Suggestions, questions, and brainstorming
20 stars 15 forks source link

Standard way of storing the actual data within a dataset #181

Open tomwparsons opened 6 years ago

tomwparsons commented 6 years ago

We're looking for the best way to encode tabular data in json that normally would be stored in a csv, I don't think that dataset is designed to actually store the raw data, but instead it describes a data file (e.g. a csv)?

If this is correct, can you recommend a format for encoding csvs and tabular data in JSON or JSON-LD?

Thanks

akuckartz commented 6 years ago

Have a look at these W3C Recommendation: https://www.w3.org/TR/csv2json/ https://www.w3.org/TR/csv2rdf/

I suggest to ignore the first and concentrate in the second.

akuckartz commented 6 years ago

Here is a primer: https://www.w3.org/TR/tabular-data-primer/

akuckartz commented 6 years ago

In particular: https://www.w3.org/TR/tabular-data-primer/#json-ld

danbri commented 6 years ago

I had a look into this a little while back, I had thought the RDF vocabulary defined in the CSVW specs covered the usecase of describing the entire content of tables, but I think it's not quite there. @gkellogg ?

tomwparsons commented 6 years ago

Thank you for your help on this. We're aiming to store topic modelling data about a SocialMediaPosting, which we'd originally thought could be stored as a separate dataSet and then referenced within the SocialMediaPosting e.g.

{
  "@type": "Dataset",
  "measurementTechnique": "topic_modelling",
  "variableMeasured": [
    {
      "@type": "PropertyValue",
      "name": "topic_0",
      "value": "0.095238",
      "valueReference": [
        {
          "@type": "PropertyValue",
          "name": "topic_stdev",
          "value": "0"
        },
        {
          "@type": "PropertyValue",
          "name": "term_weights",
          "valueReference": [
            {
              "@type": "PropertyValue",
              "name": "https://t.co/axzoqoznur",
              "value": "0.478261"
            },
            {
              "@type": "PropertyValue",
              "name": "oceans",
              "value": "0.043478"
            },
            {
              "@type": "PropertyValue",
              "name": "climate",
              "value": "0.043478"
            },
            {
              "@type": "PropertyValue",
              "name": "change",
              "value": "0.043478"
            },
            {
              "@type": "PropertyValue",
              "name": "must",
              "value": "0.043478"
            },
            {
              "@type": "PropertyValue",
              "name": "key",
              "value": "0.043478"
            },
            {
              "@type": "PropertyValue",
              "name": "paris",
              "value": "0.043478"
            },
            {
              "@type": "PropertyValue",
              "name": "talks",
              "value": "0.043478"
            },
            {
              "@type": "PropertyValue",
              "name": "say",
              "value": "0.043478"
            },
            {
              "@type": "PropertyValue",
              "name": "scientists",
              "value": "0.043478"
            },
            {
              "@type": "PropertyValue",
              "name": "threat",
              "value": "0.043478"
            },
            {
              "@type": "PropertyValue",
              "name": "via",
              "value": "0.043478"
            },
            {
              "@type": "PropertyValue",
              "name": "youtube",
              "value": "0.043478"
            }
          ]
        }
      ]
    }

Is the best way to convert to RDF, then back into JSON-LD? so ending up with data stored in '@graph' instead?

millercl commented 6 years ago

Rel: schemaorg/schemaorg#1652. There is no restriction in the data-model or the schema.org vocabulary definition that prohibits the conjunction of other data. While url may indicate a separate dereferenceable entity such as a .csv or .rdf file, it could also define, for example, a named-graph which holds the data per se within the same <script> element. JSON-LD permits multiple @context values where the CSVW or Data Cube prefixes should be defined. The W3C maintains a list of CSVW implementations that can reformat CSV.

hekl commented 5 years ago

The schema.org guidelines of Google say they are exploring support for CSVW. There is also JSON-STAT which is supported by several statistical agencies. Are they overlapping standards or complementary?

millercl commented 5 years ago

Tentatively now (beta/pilot), schema:mainEntity (as per schemaorg/schemaorg#1652) with rdf:type csvw:Table. https://plus.google.com/photos/photo/106943062990152739506/6577443689090429010?sqid=103048251221048356778&ssid=2c47ac50-fc87-4f5f-b882-e670787b32b5 Also related: schemaorg/schemaorg#1623.

RichardWallis commented 4 years ago

See issue #7 for the context of the move from the main Schema.org issue tracker to this repository.