Open arturandre opened 2 years ago
I just checked the issue, the line number you linked doesnt seem to be the issue. The issue lies on line: 234
234 downloads the file as train_2021
but 237 expects it to be train
I think one of the solutions would be to replace filename in line 234 as:
filename=os.path.basename(DATASET_URLS[self.version]).rstrip(".tar.gz")
I think, this can be labelled as "good first issue" as well to attract new developers :)
Is the issue still open ? I would like to work on it.
Is the issue still open ? I would like to work on it.
Hi, it is still open, as the code at the current main branch for this file hasn't changed yet.
Go ahead @ito-hiroki!
I have looked into the cause of the program crashes. I think there is no problem with lines 234 and 237. When we extract a file downloaded from "https://ml-inat-competition-datasets.s3.amazonaws.com/2021/train.tar.gz", we will always get a folder named "train/" regardless of the downloaded filename. I ran the following sample code on the master branch, but the code did not crash. (I want to try '2021_train', but the file is too huge.)
$ cat tmp.py
from torchvision import datasets
ds = datasets.INaturalist("./2021_val/", version="2021_valid", download=True)
$ python tmp.py
Downloading https://ml-inat-competition-datasets.s3.amazonaws.com/2021/val.tar.gz to ./2021_val/2021_valid.tgz
100.0%
Extracting ./2021_val/2021_valid.tgz to ./2021_val
Dataset version '2021_valid' has been downloaded and prepared for use
$ ls 2021_val/
2021_valid 2021_valid.tgz
Is this crash reproducible, and does it only occur under version='2021_train'
?
Is the issue still open ? I would like to work on it
issue not reproducible
Was the entire dataset downloaded? @arturandre
🐛 Describe the bug
When trying to use the iNaturalist 2021_train version the program crashes (after downloading) with the message:
Unable to find downloaded files at ...
The md5sum from the downloaded file 2021_train.tar.tgz is correct, but the name of the extracted folder is train.
Checking the code at this line it seems that the name of the extracted folder should be 2021_train. So it seems to be an issue with the compacted file 2021_train.tar.tgz.
Versions
Collecting environment information... PyTorch version: 1.11.0 Is debug build: False CUDA used to build PyTorch: 11.3 ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.4 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: 10.0.0-4ubuntu1 CMake version: version 3.16.3 Libc version: glibc-2.31
Python version: 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:57:06) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.15.0-46-generic-x86_64-with-glibc2.10 Is CUDA available: True CUDA runtime version: 11.6.112 GPU models and configuration: GPU 0: NVIDIA RTX A5000 GPU 1: NVIDIA RTX A5000
Nvidia driver version: 510.47.03 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.3.2 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.3.2 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.3.2 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.3.2 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.3.2 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.3.2 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.3.2 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
Versions of relevant libraries: [pip3] numpy==1.21.2 [pip3] torch==1.11.0 [pip3] torch-tb-profiler==0.4.0 [pip3] torchaudio==0.11.0 [pip3] torchsample==0.1.3 [pip3] torchvision==0.12.0 [conda] blas 1.0 mkl [conda] cudatoolkit 11.3.1 h2bc3f7f_2 [conda] ffmpeg 4.3 hf484d3e_0 pytorch [conda] mkl 2021.4.0 h06a4308_640 [conda] mkl-service 2.4.0 py38h95df7f1_0 conda-forge [conda] mkl_fft 1.3.1 py38h8666266_1 conda-forge [conda] mkl_random 1.2.2 py38h1abd341_0 conda-forge [conda] numpy 1.19.5 pypi_0 pypi [conda] numpy-base 1.21.2 py38h79a1101_0 [conda] pytorch 1.11.0 py3.8_cuda11.3_cudnn8.2.0_0 pytorch [conda] pytorch-mutex 1.0 cuda pytorch [conda] torch 1.6.0 pypi_0 pypi [conda] torch-tb-profiler 0.4.0 pypi_0 pypi [conda] torchaudio 0.11.0 py38_cu113 pytorch [conda] torchsample 0.1.3 dev_0
[conda] torchvision 0.7.0 pypi_0 pypi
cc @pmeier