mlcommons / croissant

Croissant is a high-level format for machine learning datasets that brings together four rich layers.
https://mlcommons.org/croissant
Apache License 2.0
440 stars 40 forks source link

Define and set a unique user-agent #761

Open goeffthomas opened 2 hours ago

goeffthomas commented 2 hours ago

If the metadata for Croissant is pulled via URL (done here), we should set a user-agent that allows the package to be identified.

For reference, kagglehub does something similar here

goeffthomas commented 2 hours ago

We should probably add that same user agent when downloading the files over HTTP: https://github.com/mlcommons/croissant/blob/main/python/mlcroissant/mlcroissant/_src/operation_graph/operations/download.py#L188-L190