mlcommons / croissant

Croissant is a high-level format for machine learning datasets that brings together four rich layers.
https://mlcommons.org/croissant
Apache License 2.0
415 stars 39 forks source link

consider reusing CSVW and DQV #656

Open VladimirAlexiev opened 4 months ago

VladimirAlexiev commented 4 months ago

It's great that you reuse Schema.org. But please also consider reusing these:

benjelloun commented 4 months ago

Hi Vladimir,

We considered CSVW, but it wasn't appropriate to describe the structure of data in Croissant, as it focuses on CSV tables. We needed a construct that could also describe unstructured data like text, images, etc., as well as nested data, like JSON, and allows joining data across these modalities.

Thanks for the pointers to DQV, definitely worth considering for RAI, as well as potential future extensions that are related to quality (e.g., in the health or geospatial domains).