microsoft / torchgeo

TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
https://www.osgeo.org/projects/torchgeo/
MIT License
2.31k stars 298 forks source link

README.md benchmark dataset code #2069

Open douglasmacdonald opened 1 month ago

douglasmacdonald commented 1 month ago

Issue

I need help getting the example code on the README.md to work. I am now concentrating on the Benchmark datasets (https://github.com/microsoft/torchgeo?tab=readme-ov-file#benchmark-datasets).

I am running on the Planetary Computer platform.

I did not have any luck with the platform's default torchgeo and so run

!pip install torchgeo --upgrade

And this gives me version '0.5.2'.

However, I am still having problems....

1)

dataset = VHR10('data', download=True, checksum=True)

RuntimeError: The MD5 checksum of the download file data[/NWPU](https://pccompute.westeurope.cloudapp.azure.com/NWPU) VHR-10 dataset.rar does not match the one on record.

2)

from torchgeo.datamodules.utils import collate_fn_detection

ImportError: cannot import name 'collate_fn_detection' from 'torchgeo.datamodules.utils' ([/srv/conda/envs/notebook/lib/python3.11/site-packages/torchgeo/datamodules/utils.py](https://pccompute.westeurope.cloudapp.azure.com/srv/conda/envs/notebook/lib/python3.11/site-packages/torchgeo/datamodules/utils.py))

Fix

I assume version problems.

adamjstewart commented 1 month ago

Hi @douglasmacdonald, sorry you ran into these issues!

I did not have any luck with the platform's default torchgeo

@calebrob6 do we have any contacts we can use to upgrade the default torchgeo version on PC?

RuntimeError: The MD5 checksum of the download file data/NWPU VHR-10 dataset.rar does not match the one on record.

I am not able to reproduce this. What version of torchvision are you using? TorchGeo uses torchvision download utils, and torchvision 0.17.1+ switched from requests to gdown for all Google Drive downloads. It may resolve the issue if you delete the file, upgrade to torchvision 0.17.1+, and install gdown.

ImportError: cannot import name 'collate_fn_detection' from 'torchgeo.datamodules.utils'

This is indeed a version issue. The feature you are trying to use was added in #1082 and will be included in the 0.6.0 release.

My personal recommendation would be to pick a different dataset, VHR-10 is actually one of the more complicated ones. If you're completely new to PyTorch, you're actually better off starting with a torchvision tutorial. All TorchGeo NonGeoDatasets are designed to be functionally identical to torchvision datasets. So if you know how to use torchvision, you know how to use torchgeo. If you still want to use VHR-10, either install the development version (0.6.0.dev0) or wait for the 0.6.0 release (maybe in 1 month?).

douglasmacdonald commented 1 month ago

Hello,

One moment!

Could it have anything to do with using:

pip install torchgeo==0.5.2

Where I maybe should be using

pip install torchgeo[all]==0.5.2

?

Best, Douglas

adamjstewart commented 1 month ago

VHR-10 requires 3 optional dependencies:

  1. gdown: to download the dataset if using torchvision 0.17.1+
  2. rarfile: to extract the .rar file
  3. pycocotools: to read the labels for the 'positive' set

Running pip install torchgeo will not install any of these. Running pip install torchgeo[datasets] will install 2 and 3. Torchvision does not currently automatically install 1 for you, so you have to install it yourself.

We may want to add 1 to [datasets]. If you want to submit a PR to do this, I would be happy to review it.

EDIT: I opened https://github.com/pytorch/vision/pull/8430 to help better document this. With this, we could use torchvision in our required deps and torchvision[gdown] in our optional deps.

isaaccorley commented 1 month ago

Should we also change the example to use a different dataset like InriaAIL or EuroSAT?

adamjstewart commented 1 month ago

The example is fixed (it should work now on main), but I would be happy to change to a different dataset too. I only used that example because VHR-10 was the first dataset I wrote and it had a cool prediction plot.

robmarkcole commented 1 month ago

I've just attempted with torchgeo.__version__ == '0.6.0.dev0' and get a different error:

from torch.utils.data import DataLoader

from torchgeo.datamodules.utils import collate_fn_detection
from torchgeo.datasets import VHR10

# Initialize the dataset
dataset = VHR10(download=True)

---------------------------------------------------------------------------
NotRarFile                                Traceback (most recent call last)
Cell In[2], [line 7](vscode-notebook-cell:?execution_count=2&line=7)
      [4](vscode-notebook-cell:?execution_count=2&line=4) from torchgeo.datasets import VHR10
      [6](vscode-notebook-cell:?execution_count=2&line=6) # Initialize the dataset
----> [7](vscode-notebook-cell:?execution_count=2&line=7) dataset = VHR10(download=True)

File [/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:218](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:218), in VHR10.__init__(self, root, split, transforms, download, checksum)
    [215](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:215) self.checksum = checksum
    [217](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:217) if download:
--> [218](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:218)     self._download()
    [220](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:220) if not self._check_integrity():
    [221](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:221)     raise DatasetNotFoundError(self)

File [/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:343](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:343), in VHR10._download(self)
    [340](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:340)     return
    [342](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:342) # Download images
--> [343](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:343) download_and_extract_archive(
    [344](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:344)     self.image_meta['url'],
    [345](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:345)     self.root,
    [346](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:346)     filename=self.image_meta['filename'],
    [347](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:347)     md5=self.image_meta['md5'] if self.checksum else None,
    [348](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:348) )
    [350](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:350) # Annotations only needed for "positive" image set
    [351](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:351) if self.split == 'positive':
    [352](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/vhr10.py:352)     # Download annotations

File [/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:145](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:145), in download_and_extract_archive(url, download_root, extract_root, filename, md5)
    [143](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:143) archive = os.path.join(download_root, filename)
    [144](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:144) print(f'Extracting {archive} to {extract_root}')
--> [145](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:145) extract_archive(archive, extract_root)

File [/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:99](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:99), in extract_archive(src, dst)
     [97](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:97) for suffix, extractor in suffix_and_extractor:
     [98](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:98)     if src.endswith(suffix):
---> [99](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:99)         with extractor(src, 'r') as f:
    [100](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:100)             f.extractall(dst)
    [101](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:101)         return

File [/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:48](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:48), in _rarfile.RarFile.__enter__(self)
     [45](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:45) rarfile = lazy_import('rarfile')
     [46](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:46) # TODO: catch exception for when rarfile is installed but not
     [47](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:47) # unrar/unar/bsdtar
---> [48](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/datasets/utils.py:48) return rarfile.RarFile(*self.args, **self.kwargs)

File [/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:711](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:711), in RarFile.__init__(self, file, mode, charset, info_callback, crc_check, errors, part_only)
    [708](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:708) if mode != "r":
    [709](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:709)     raise NotImplementedError("RarFile supports only mode=r")
--> [711](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:711) self._parse()

File [/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:930](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:930), in RarFile._parse(self)
    [928](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:928)     self._file_parser = p5  # noqa
    [929](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:929) else:
--> [930](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:930)     raise NotRarFile("Not a RAR file")
    [932](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:932) self._file_parser.parse()
    [933](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/rarfile.py:933) self.comment = self._file_parser.comment

NotRarFile: Not a RAR file

I get the same error from the command line:

⚡ ~/data unrar e NWPU\ VHR-10\ dataset.rar 

UNRAR 5.61 beta 1 freeware      Copyright (c) 1993-2018 Alexander Roshal

NWPU VHR-10 dataset.rar is not RAR archive
No files to extract

OK it appears the file downloaded by torchgeo was somehow corrupted - if I manually download via the browser there is no issue.

However I then get Dataset not found in root='data', resolved by commenting out self._check_integrity.

Appears to be because there is a check: Checking integrity of data/NWPU VHR-10 dataset.rar which I deleted.

I just comment out that check and the dataset loads fine, however the image is not contrast stretched:

image

That is resolved using percentile_normalization and I am now ready to go:

image
isaaccorley commented 1 month ago

@robmarkcole what version of torchvision are you using? You can turn off the integrity check by passing checksum=False to the dataset/datamodule.

robmarkcole commented 1 month ago

'0.6.0.dev0' via pip install git+https://github.com/microsoft/torchgeo.git@main#egg=torchgeo

Can confirm no issues using checksum=False so long as the data is downloaded without corruption and the rar is present.

There appears to be another issue, which I believe is due to collate_fn_detection not being applied - on using the dataset I get error:

[252](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:252) """Compute the validation metrics.
    [253](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:253) 
    [254](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:254) Args:
   (...)
    [257](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:257)     dataloader_idx: Index of the current dataloader.
    [258](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:258) """
    [259](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:259) x = batch['image']
--> [260](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:260) batch_size = x.shape[0]
    [261](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:261) y = [
    [262](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:262)     {'boxes': batch['boxes'][i], 'labels': batch['labels'][i]}
    [263](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:263)     for i in range(batch_size)
    [264](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:264) ]
    [265](https://vscode-remote+vscode-002d01hkcwhn6mva5nchdpjsd1bsb3-002estudio-002elightning-002eai.vscode-resource.vscode-cdn.net/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchgeo/trainers/detection.py:265) y_hat = self(x)

AttributeError: 'list' object has no attribute 'shape'

Usage:

class VHR10DataModule(L.LightningDataModule):
    def __init__(self, data_dir: str = "", batch_size: int = 4, num_workers: int = 0,):
        super().__init__()
        self.data_dir = data_dir
        self.batch_size = batch_size
        self.num_workers = num_workers

    def setup(self, stage: str):
        return 

    def train_dataloader(self):
        return DataLoader(dataset, batch_size=self.batch_size, collate_fn=collate_fn_detection, num_workers=self.num_workers)

    def val_dataloader(self):
        return DataLoader(dataset, batch_size=self.batch_size, collate_fn=collate_fn_detection, num_workers=self.num_workers)

    def test_dataloader(self):
        return DataLoader(dataset, batch_size=self.batch_size, collate_fn=collate_fn_detection, num_workers=self.num_workers)

datamodule = VHR10DataModule(data_dir="data", batch_size=4, num_workers=0)
datamodule.setup("fit")
isaaccorley commented 1 month ago

The collate fn is being applied but the trainer doesn't accept a list of images but expects it to be a tensor only which definitely is a bug when each image is a different size in the VHR-10 dataset they can't be stacked properly.

robmarkcole commented 1 month ago

In this batch, the images all had different shapes - presume I just need to add a cropping augmentation?

torch.Size([3, 808, 958])
torch.Size([3, 806, 950])
torch.Size([3, 803, 889])
torch.Size([3, 732, 946])
isaaccorley commented 1 month ago

Yep that's correct. I know in the past that Kornia had some bugs with the augmentations not properly being applied to the boxes but that appears to have been fixed.

robmarkcole commented 1 month ago

OK just noticed from torchgeo.datamodules import VHR10DataModule which handles this :-)

So in summary: 1) The rar file being corrupted was the cause of my issue 2) The plottling is off without percentile_normalization

isaaccorley commented 1 month ago

Oof I thought you were already using it or I would have suggested the datamodule. I'll take a look at fixes for this. Thanks for being an A+ test engineer!

adamjstewart commented 1 month ago

Trying to catch up on this thread...

@calebrob6 do we have any contacts we can use to upgrade the default torchgeo version on PC?

Well, that didn't age well. Looks like PC will be shutting down, so no need to worry about this anymore.

The rar file being corrupted was the cause of my issue

If we're seeing intermittent issues with GDrive, we could rehost the dataset on HF. It appears to be released under an MIT license.

The plottling is off without percentile_normalization

I'm happy to submit a PR to fix this, but then no one will review it... Does anyone else want to submit a PR?

@ashnair1 was the last person to touch this dataset.

calebrob6 commented 1 month ago

PC hub (the free compute) is shutting down, the 50+ PB of data hosting and APIs that let you index into it, explorer for visualizing it, and catalog are all unchanged AFAIK

ashnair1 commented 4 weeks ago

Good catch regarding normalization. By default the images are uint8 and are loaded as floats. During training the images are normalized (in the datamodule) to a range of 0-1 before plotting which is why the training plots look normal. However while plotting samples via the method directly, the tensor has values that range from 0-255 and is in float dtype making the plot incorrect.

adamjstewart commented 3 weeks ago

@burakekim is going to inquire about redistributing VHR-10 on Hugging Face, which will allow us to get rid of the Google Drive issues and remove dependencies on rarfile and gdown. Hopefully that will solve some of the issues you encountered!

I think we should also replace the README example with a simpler dataset like EuroSAT, which will finally close this issue.