mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.62k stars 561 forks source link

does not have storage.objects.list access to the Google Cloud Storage bucket #673

Open karpenko-p-n opened 1 year ago

karpenko-p-n commented 1 year ago

I am trying to follow https://github.com/mlcommons/training/blob/master/large_language_model/megatron-lm/README.md#data-download to download data on gs://mlperf-llm-public2 as following: gsutil cp -r gs://mlperf-llm-public2/c4/en_val_subset_json/c4-validation_24567exp.json .

It fails with error message as following: "AccessDeniedException: 403 xxx.xxx@gmail.com does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist)."

Could anyone give any suggestion on how to download gs://mlperf-llm-public2/c4/en_val_subset_json/c4-validation_24567exp.json ?

Thanks a lot

ShriyaPalsamudram commented 3 months ago

All required data can be downloaded using instructions in the S3 artifacts download section of the README.

hiwotadese commented 3 months ago

@karpenko-p-n can you try with the updated instruction in the readme?