mlcommons / croissant

Croissant is a high-level format for machine learning datasets that brings together four rich layers.
https://mlcommons.org/croissant
Apache License 2.0
452 stars 41 forks source link

Fix discrepancies with the specs #742

Closed ccl-core closed 2 months ago

ccl-core commented 2 months ago

1) ids and names are the same for Fields and RecordSets (see migration 202409231500.py): updated metadata and output; 2) In the get_column method of Source, we return the node's uuid if no extract method is specified; 3) for RecordSet specifying data, we look at field.id and not field.name to get the expected keys; 4) also added a filters flag to load.py.

github-actions[bot] commented 2 months ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅