Open itaynvn-runai opened 5 days ago
tested with this image: kubeflow/pytorch-dist-mnist:latest
(latest tag, pushed at 22/11/2024)
https://hub.docker.com/r/kubeflow/pytorch-dist-mnist/tags
the links were switched to a public S3 bucket, and download process completes:
Using distributed PyTorch with gloo backend
World Size: 2. Rank: 1
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ../data/FashionMNIST/raw/train-images-idx3-ubyte.gz
0%| | 0/26421880 [00:00<?, ?it/s]
0%| | 65536/26421880 [00:00<01:12, 365219.76it/s]
1%| | 229376/26421880 [00:00<00:38, 685094.04it/s]
3%|▎ | 917504/26421880 [00:00<00:09, 2610886.88it/s]
7%|▋ | 1933312/26421880 [00:00<00:05, 4111033.66it/s]
26%|██▌ | 6848512/26421880 [00:00<00:01, 16200010.18it/s]
38%|███▊ | 10059776/26421880 [00:00<00:00, 20608644.80it/s]
47%|████▋ | 12517376/26421880 [00:01<00:00, 17876773.56it/s]
64%|██████▍ | 16973824/26421880 [00:01<00:00, 24547329.01it/s]
84%|████████▍ | 22315008/26421880 [00:01<00:00, 26412748.88it/s]
98%|█████████▊| 25985024/26421880 [00:01<00:00, 24075278.44it/s]
100%|██████████| 26421880/26421880 [00:01<00:00, 16889476.36it/s]
Extracting ../data/FashionMNIST/raw/train-images-idx3-ubyte.gz to ../data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ../data/FashionMNIST/raw/train-labels-idx1-ubyte.gz
0%| | 0/29515 [00:00<?, ?it/s]
100%|██████████| 29515/29515 [00:00<00:00, 325193.23it/s]
Extracting ../data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to ../data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ../data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz
0%| | 0/4422102 [00:00<?, ?it/s]
1%|▏ | 65536/4422102 [00:00<00:12, 361558.72it/s]
5%|▌ | 229376/4422102 [00:00<00:06, 681986.84it/s]
21%|██ | 917504/4422102 [00:00<00:01, 2593771.27it/s]
44%|████▎ | 1933312/4422102 [00:00<00:00, 4090096.69it/s]
100%|██████████| 4422102/4422102 [00:00<00:00, 6085832.68it/s]
Extracting ../data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ../data/FashionMNIST/raw
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ../data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz
FYI this new image should replace these 2 old images, currently used in alot of the examples across the repo:
gcr.io/kubeflow-ci/pytorch-dist-mnist_test:1.0
(latest tag, pushed at 07/03/2019)
https://console.cloud.google.com/gcr/images/kubeflow-ci/global/pytorch-dist-mnist_test
gcr.io/kubeflow-ci/pytorch-dist-mnist-test:v1.0
(latest tag, pushed at 03/03/2019)
https://console.cloud.google.com/gcr/images/kubeflow-ci/global/pytorch-dist-mnist-test
issue:
following this guide: https://www.kubeflow.org/docs/components/training/user-guides/pytorch/
which is using this image:
that attempts to download this file:
but as of today, requesting this link gets 403 status.
here you can see the proper output for this image: https://developer-qa.nvidia.com/blog/gpu-containers-runtime/#:~:text=Try%20running%20the%20MNIST%20training%20example%20included%20with%20the%20container%3A
suggestions:
notes: i assume this link is hardcoded in a script which is used in the dockerfile used to build this image. i found several references to this link across the kubeflow github: https://github.com/search?q=org%3Akubeflow%20%22train-images-idx3-ubyte.gz%22&type=code but couldn't trace the dockerfile used to build this image, nor detect which of these scripts was used in it.