Open goeffthomas opened 2 hours ago
We should probably add that same user agent when downloading the files over HTTP: https://github.com/mlcommons/croissant/blob/main/python/mlcroissant/mlcroissant/_src/operation_graph/operations/download.py#L188-L190
If the metadata for Croissant is pulled via URL (done here), we should set a user-agent that allows the package to be identified.
For reference,
kagglehub
does something similar here