mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.61k stars 556 forks source link

Paxml c4 resplit dataset permission issues #764

Closed gramesh-amd closed 2 months ago

gramesh-amd commented 2 months ago

Paxml training instructions provide link to gcs bucket to get the 3.0.4 resplit for mlperf. But I dont think its publicly accessible

gsutil -u 'gcp_project_name' -m cp 'gs://mlperf-llm-public2/c4/en/3.0.4' gives me permission error

gramesh-amd commented 2 months ago

cc: @ShriyaPalsamudram

ShriyaPalsamudram commented 2 months ago

@sgpyc could you please fix the instructions so the paths now point to the S3 bucket instead? This PR does the same for the megatron-lm reference

@gramesh-amd in the meantime, can you use these instructions which should also have the paxml versions of the data/ckpts.

gramesh-amd commented 2 months ago

@ShriyaPalsamudram Thanks for the quick reply

I did read through the instructions page.

They seem to point to the same gs bucket for paxml checkpoint (gs://mlperf-llm-public2/gpt3_spmd1x64x24_tpuv4-3072_v84_20221101/checkpoints/checkpoint_00004000) and training dataset (gs://mlperf-llm-public2/c4/en_val_subset_json/c4-validation_24567exp.json)

Both these paths result in permissions issues for me

ShriyaPalsamudram commented 2 months ago

@gramesh-amd Can you specifically follow the S3 artifacts download section which does not point to the gs bucket?

Once you setup rclone, you can investigate mlc-training:mlcommons-training-wg-public/gpt3/ path which should have both paxml and megatron-lm dataset and ckpt artifacts. So everything needed to run the references should be available in the S3 bucket

gramesh-amd commented 2 months ago

Thanks Let me go through it and reopen this if there is any trouble

gramesh-amd commented 2 months ago

@ShriyaPalsamudram @sgpyc the above steps lets me download the gpt3 paxml ckpt but i cant access the 3.0.4 train/validation splits of c4 mlperf. The links mentioned in pax page doesnt work

Could you please let me know the updated links?