mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.57k stars 548 forks source link

does not have storage.objects.list access to the Google Cloud Storage bucket #673

Open karpenko-p-n opened 11 months ago

karpenko-p-n commented 11 months ago

I am trying to follow https://github.com/mlcommons/training/blob/master/large_language_model/megatron-lm/README.md#data-download to download data on gs://mlperf-llm-public2 as following: gsutil cp -r gs://mlperf-llm-public2/c4/en_val_subset_json/c4-validation_24567exp.json .

It fails with error message as following: "AccessDeniedException: 403 xxx.xxx@gmail.com does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist)."

Could anyone give any suggestion on how to download gs://mlperf-llm-public2/c4/en_val_subset_json/c4-validation_24567exp.json ?

Thanks a lot