Inconsistencies between handling of bands in datasets

microsoft / torchgeo

TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data

https://www.osgeo.org/projects/torchgeo/

MIT License

2.57k stars 317 forks source link

Inconsistencies between handling of bands in datasets #289

Open nilsleh opened 2 years ago

nilsleh commented 2 years ago

I am currently working on adding plotting methods based on #253. For datasets that contain Sentinel multispectral imagery, I have two questions.

In the CV4AKenyaCropType, the band names can be chosen, when creating a dataset by passing a tuple of strings. In the SEN12MS dataset, the bands can be chose with integers in a list. In ZueriCrop dataset it is not possible to specify bands. Should there not be a standardized way across all datasets?
Given that people can choose different bands, how should the plot method be handled. For example, when RGB bands are not chosen? Or another way of handling plotting for multispectral datasets?

adamjstewart commented 2 years ago

Yes, this should be standardized some way
We could either have plotting fail, or have it still work as long as there are at least 3 bands. @calebrob6 thoughts?

calebrob6 commented 2 years ago

Great questions!

Agree with Adam
I've been raising a ValueError when the RGB bands aren't present (https://github.com/microsoft/torchgeo/blob/main/torchgeo/datasets/seco.py#L260, https://github.com/microsoft/torchgeo/blob/plotting/torchgeo/datasets/benin_cashews.py#L440). I wanted the plot methods to be an easy way for someone to see what the data looks like / gently introduce people who aren't familiar with satellite imagery to the fact that it doesn't come nicely normalized (i.e. I don't want users to get stuck trying to visualize S2 patches). I don't want the plot methods to try to solve every case of visualizing the data.

nilsleh commented 2 years ago

If you have a preference for a standardized way of doing this, I can change all the datasets accordingly.
Given a standardized way of choosing bands, I can implement the plotting methods for those datasets like the examples you gave.

calebrob6 commented 2 years ago

I like list or tuple of string band names
Great!

I think both of these can be done at the same time

isaaccorley commented 2 years ago

An alternative could be to plot a false color image from a user defined subset of the bands in the image (basically a user defines indices like [0, 1, 2]) as an argument to plot. This is how it's done in torchgeo.datasets.IDTReeS when plotting the hyperspectral image.

adamjstewart commented 2 years ago

Another issue I noticed is that So2Sat doesn't allow you to select bands but So2SatDataModule does. The issue with this is that the So2Sat.plot method assumes that the sample will always have all bands. We should copy the So2SatDataModule band selection to So2Sat and update So2Sat.plot to handle this.

remtav commented 2 years ago

Relating to band selection: as more datasets arise in torchgeo, could band selection become standardized to the STAC specification common_names? This could also become a standard way of doing band selection from STAC items when some datasets point to STAC items.

FYI, I'm currently trying this "band selection interface" out with a dataset implemented torchgeo's way. I think it has some potential.

adamjstewart commented 2 years ago

Related to, and partially addressed by #687.

As I mentioned in #394, I'm starting to think that it doesn't make much sense to allow band selection for curated benchmark datasets, only for uncurated raster datasets. What do you think?

calebrob6 commented 2 years ago

Why not? E.g. experiments that compare how multispectral information compare to only rgb information on benchmark datasets are quite interesting, I think.

adamjstewart commented 2 years ago

Hmm, that's fair, someone might want to do a more in-depth ablation study on which bands actually matter. I guess I'm not opposed if someone wants it, but I wouldn't go changing everything if we don't need it, especially if it doesn't speed up data loading times.

I think I want to rename the So2Sat bands anyway, will submit a PR to do that.