Open nilsleh opened 2 hours ago
Nevermind, I just need to learn how to use aws-cli
properly.
I'm not able to reproduce the exact error message (the download "succeeds" for me), but the downloaded file is corrupted, and tar crashes instead:
> python3
>>> from torchgeo.datasets import SpaceNet8
>>> ds = SpaceNet8(root="data", split="train", download=True)
download: s3://spacenet-dataset/spacenet/SN8_floods/tarballs/Germany_Training_Public.tar.gz to data/SN8_floods/train/Germany_Training_Public.tar.gz
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/Adam/torchgeo/torchgeo/datasets/spacenet.py", line 146, in __init__
self._verify()
File "/Users/Adam/torchgeo/torchgeo/datasets/spacenet.py", line 336, in _verify
extract_archive(os.path.join(root, tarball), root)
File "/Users/Adam/spack/var/spack/environments/default/.spack-env/view/lib/python3.11/site-packages/torchvision/datasets/utils.py", line 374, in extract_archive
extractor(from_path, to_path, compression)
File "/Users/Adam/spack/var/spack/environments/default/.spack-env/view/lib/python3.11/site-packages/torchvision/datasets/utils.py", line 220, in _extract_tar
tar.extractall(to_path)
File "/Users/Adam/spack/opt/spack/darwin-sequoia-m2/apple-clang-16.0.0/python-3.11.9-miamin5zo2vhkrb22ej7xpjqlcjsuugs/lib/python3.11/tarfile.py", line 2265, in extractall
self._extract_one(tarinfo, path, set_attrs=not tarinfo.isdir(),
File "/Users/Adam/spack/opt/spack/darwin-sequoia-m2/apple-clang-16.0.0/python-3.11.9-miamin5zo2vhkrb22ej7xpjqlcjsuugs/lib/python3.11/tarfile.py", line 2328, in _extract_one
self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
File "/Users/Adam/spack/opt/spack/darwin-sequoia-m2/apple-clang-16.0.0/python-3.11.9-miamin5zo2vhkrb22ej7xpjqlcjsuugs/lib/python3.11/tarfile.py", line 2411, in _extract_member
self.makefile(tarinfo, targetpath)
File "/Users/Adam/spack/opt/spack/darwin-sequoia-m2/apple-clang-16.0.0/python-3.11.9-miamin5zo2vhkrb22ej7xpjqlcjsuugs/lib/python3.11/tarfile.py", line 2465, in makefile
copyfileobj(source, target, tarinfo.size, ReadError, bufsize)
File "/Users/Adam/spack/opt/spack/darwin-sequoia-m2/apple-clang-16.0.0/python-3.11.9-miamin5zo2vhkrb22ej7xpjqlcjsuugs/lib/python3.11/tarfile.py", line 252, in copyfileobj
buf = src.read(bufsize)
^^^^^^^^^^^^^^^^^
File "/Users/Adam/spack/opt/spack/darwin-sequoia-m2/apple-clang-16.0.0/python-3.11.9-miamin5zo2vhkrb22ej7xpjqlcjsuugs/lib/python3.11/gzip.py", line 301, in read
return self._buffer.read(size)
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Adam/spack/opt/spack/darwin-sequoia-m2/apple-clang-16.0.0/python-3.11.9-miamin5zo2vhkrb22ej7xpjqlcjsuugs/lib/python3.11/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Adam/spack/opt/spack/darwin-sequoia-m2/apple-clang-16.0.0/python-3.11.9-miamin5zo2vhkrb22ej7xpjqlcjsuugs/lib/python3.11/gzip.py", line 518, in read
raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached
The checksum is indeed different. However, when I download the file outside of TorchGeo, I don't see this issue:
> aws s3 cp s3://spacenet-dataset/spacenet/SN8_floods/tarballs/Germany_Training_Public.tar.gz .
> md5 Germany_Training_Public.tar.gz
MD5 (Germany_Training_Public.tar.gz) = 5f1c9ac3ea94f2909da593d894680ea2
> tar xzf Germany_Training_Public.tar.gz
Unclear if this is a transient issue or something else.
P.S. I think I still have SN8 (and all other versions) downloaded on our AI4EO server if you need it immediately.
Also, the lead on SN8 was Ronny Haensch from DLR. I have an email thread with him asking about the SN8 AOIs if you want me to ping him on this. But I think we need to get to the bottom of why it isn't working inside TorchGeo first.
You are right, the corrupted download also happens for the "test" split. I wanted to download the dataset, so I can add a datamodule for spacenet 6 and 8. Spacenet6 downloads fine with no errors.
Description
Steps to reproduce
Or potentially, I also need to configure something else? I do have
aws-cli
installed.Version
0.7.0.dev0