microsoft / torchgeo

TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
https://www.osgeo.org/projects/torchgeo/
MIT License
2.32k stars 298 forks source link

Update VHR-10 dataset plotting #2092

Closed robmarkcole closed 1 month ago

robmarkcole commented 1 month ago

Fix plotting using percentiles

Note: I have uploaded the dataset at https://huggingface.co/datasets/satellite-image-deep-learning/VHR-10 - can transfer to torchgeo org if you want to use it here

ashnair1 commented 1 month ago

I get the normalization fix but why is the README code snippet being changed? Once 0.6.0 is released, the snippet should work. Also as I understand it, the goal of the snippet is to show how the dataloader works not how to train using datamodules

robmarkcole commented 1 month ago

I just assumed the docs predate the datamodules but we would like to showcase this feature - personally I always use datamodules now but perhaps am in the minority there. Suggest a second opinion before I change the PR

robmarkcole commented 1 month ago

There is some fishy behaviour still - in the load_target the image id has one subtracted here - why not use as below?

annot = self.coco.loadAnns(self.coco.getAnnIds(imgIds=id_))
robmarkcole commented 1 month ago

Applied black as per the guide, and this has changed ALL of the strings

ashnair1 commented 1 month ago

Applied black as per the guide, and this has changed ALL of the strings

torchgeo no longer uses black, isort etc and has recently switched to ruff for linting and code formatting.

ashnair1 commented 1 month ago

There is some fishy behaviour still - in the load_target the image id has one subtracted here - why not use as below?

annot = self.coco.loadAnns(self.coco.getAnnIds(imgIds=id_))

Refer to https://github.com/microsoft/torchgeo/pull/847#discussion_r1044127463

robmarkcole commented 1 month ago

@ashnair1 I checked the approach I suggested (imgIds=id_)) and this results in erroneous masks - strange as the API appears to support this approach, suggesting the implementation has some quirks

image

The approach I am using elsewhere: 1) Get list of image ids: self.ids = list(sorted(self.coco.imgs.keys())) and use in getitem: id_ = self.ids[index] 2) Access filename using id: `self.coco.imgs[id]['filename'] 3) Access annotations using API:annot = self.coco.loadAnns(self.coco.getAnnIds(imgIds=id))`

adamjstewart commented 1 month ago

Regarding the README

The README has a few different sections. The section you're editing is designed to show off our benchmark datasets. There is a separate section below specifically for PyTorch Lightning, including data modules. I would prefer to keep using a dataset instead of a data module here.

Actually, I think we're better off choosing a different dataset (maybe EuroSAT?). VHR-10 is too complicated and makes it look like TorchGeo is hard to use. Maybe we could move VHR-10 to the Lightning section of the README since it has a pretty visualization.

Regarding plotting

Yes, this is a common issue with plotting via dataset vs. plotting via data module. I agree with your solution, and proposed basically the same thing for all datasets in #1263. This issue is not specific to VHR-10, but affects almost all datasets/data modules. I haven't had the time/energy to fix this for all datasets, but contributions are definitely welcome here, even if it's only 1 dataset at a time.

Can we separate the README and plotting changes into two separate PRs?

adamjstewart commented 1 month ago

Note: I have uploaded the dataset at https://huggingface.co/datasets/satellite-image-deep-learning/VHR-10 - can transfer to torchgeo org if you want to use it here

It's easier to download tar/zip/rar files instead of raw data. But I'm happy to change the download to your org or ours. Not sure if we should rehost all rar datasets as tar/zip to make them easier to extract?

robmarkcole commented 1 month ago

@adamjstewart this PR now just addresses the plotting.

RE dataset choice, I agree EuroSAT would be a nice choice - it also supports selecting subsets of bands so shows immediately what is possible with torchgeo