mlcommons / croissant

Croissant is a high-level format for machine learning datasets that brings together four rich layers.
https://mlcommons.org/croissant
Apache License 2.0
448 stars 40 forks source link

Content negotiation and correct redirections for croissant specification #624

Open dgarijo opened 7 months ago

dgarijo commented 7 months ago

According to https://github.com/mlcommons/croissant/blob/main/docs/croissant.ttl, The croissant namespace is <http://mlcommons.org/croissant/>

However, http://mlcommons.org/croissant/ redirects to https://mlcommons.org/working-groups/data/croissant/ instead of https://mlcommons.github.io/croissant/docs/croissant-spec.html.

In addition, there is no content negotiation on JSON-LD / TTL, making it tricky to import in other vocabularies. Is there a plan to support JSON-LD content negotiation? I.e., when doing:

curl -H "Accept:application/json+ld" -L http://mlcommons.org/croissant/

Obtain a json-ld context with the full specification.

I think this would help clarify which concepts are defined within croissant without having to read the specification to detail. Other namespaces like <http://mlcommons.org/croissant/RAI/> lead to a 404.

dgarijo commented 7 months ago

Btw, if there is a need to generate an open community maintained persistent id for croissant like https://w3id.org/croissant/ with content negotiation support (resources can be stored in the mlcommons.org, github or elsewhere), I can help setting it up in https://github.com/perma-id/w3id.org?tab=readme-ov-file#permanent-identifiers-for-the-web.