mila-iqia / training

8 stars 7 forks source link

Dataset download slowed to 240 Bytes per second #10

Closed aurotripathy closed 5 years ago

aurotripathy commented 5 years ago

Hello:

I'm setting up to run the training benchmarks on ROCm/PyTorch.

The ./download_datasets.sh script has slowed to 240 Bytes every second after running for a few hours at MBps speeds.

I'm on a CPU that has 128 cores (64/socket) and all processes are consuming 0% except for one consuming < 1%.

Should I use sequential download option?

Thank you.

p.s. I'm looking into whether this is out internal throttling issue.

 root@h2-hq-01:~/data/data# ll ~/data/data/
total 36
drwxr-xr-x 9 root root 4096 Aug 23 02:53 ./
drwxr-xr-x 5 root root 4096 Aug 23 01:09 ../
drwxr-xr-x 3 root root 4096 Aug 23 02:04 ImageNet/
-rw-r--r-- 1 root root    0 Aug 23 02:04 ImageNet.cache
drwxr-xr-x 3 root root 4096 Aug 23 01:09 bsds500/
-rw-r--r-- 1 root root    0 Aug 23 01:09 bsds500.cache
drwxr-xr-x 2 root root 4096 Aug 23 01:09 coco/
-rw-r--r-- 1 root root    0 Aug 23 01:09 coco2017.cache
drwxr-xr-x 2 root root 4096 Oct 17  2016 ml-20m/
-rw-r--r-- 1 root root    0 Aug 23 01:09 ml-20m.cache
drwxr-xr-x 3 root root 4096 Aug 23 01:09 mnist/
drwxr-xr-x 2 root root 4096 Aug 23 01:09 time_series_prediction/
drwxr-xr-x 5 root root 4096 Aug 23 02:50 wmt16/
-rw-r--r-- 1 root root    0 Aug 23 02:53 wmt16.cache
aurotripathy commented 5 years ago

This is likely our issue...closing.