mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.57k stars 548 forks source link

Are gpt tokenizer model open-source? #653

Open xyyintel opened 1 year ago

xyyintel commented 1 year ago

Hi ,

When I'm trying to download tokenizer model from gs://mlperf-llm-public2/vocab/c4_en_301_5Mexp2_spm.model using such command: ./google-cloud-sdk/bin/gsutil cp -R gs://mlperf-llm-public2/vocab/c4_en_301_5Mexp2_spm.model ./ I received the error: AccessDeniedException: 403 myname@gmail.com does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).

Any suggestions?

xyyintel commented 1 year ago

PS. I created a new project, and in 'IMA' page already grant such access:

Environment and Storage Object Administrator Environment and Storage Object User Owner Storage Admin Storage Object Admin Storage Object Viewer

Suppose it shall include 'storage.objects.list'.

Does it because the dataset owner didn't set public to all users without any limitation?