Closed gramesh-amd closed 2 months ago
cc: @ShriyaPalsamudram
@sgpyc could you please fix the instructions so the paths now point to the S3 bucket instead? This PR does the same for the megatron-lm reference
@gramesh-amd in the meantime, can you use these instructions which should also have the paxml versions of the data/ckpts.
@ShriyaPalsamudram Thanks for the quick reply
I did read through the instructions page.
They seem to point to the same gs bucket for paxml checkpoint (gs://mlperf-llm-public2/gpt3_spmd1x64x24_tpuv4-3072_v84_20221101/checkpoints/checkpoint_00004000) and training dataset (gs://mlperf-llm-public2/c4/en_val_subset_json/c4-validation_24567exp.json)
Both these paths result in permissions issues for me
@gramesh-amd Can you specifically follow the S3 artifacts download section which does not point to the gs bucket?
Once you setup rclone, you can investigate mlc-training:mlcommons-training-wg-public/gpt3/
path which should have both paxml and megatron-lm dataset and ckpt artifacts. So everything needed to run the references should be available in the S3 bucket
Thanks Let me go through it and reopen this if there is any trouble
@ShriyaPalsamudram @sgpyc the above steps lets me download the gpt3 paxml ckpt but i cant access the 3.0.4 train/validation splits of c4 mlperf. The links mentioned in pax page doesnt work
Could you please let me know the updated links?
Paxml training instructions provide link to gcs bucket to get the 3.0.4 resplit for mlperf. But I dont think its publicly accessible
gsutil -u 'gcp_project_name' -m cp 'gs://mlperf-llm-public2/c4/en/3.0.4' gives me permission error