mapbox / robosat

Semantic segmentation on aerial and satellite imagery. Extracts features such as: buildings, parking lots, roads, water, clouds
MIT License
2.01k stars 382 forks source link

Bringin own data #222

Open manapshymyr-OB opened 2 years ago

manapshymyr-OB commented 2 years ago

Hello! @daniel-j-h Thank you for your project.

I have some questions. 1 Can we bring own dataset(with labels) for example: https://github.com/phelber/EuroSAT.

  1. If we can bring this data, should we convert them into PNG or robosat will work with multi spectral data?
  2. How we should select the zoom level? I intend to use Sentinel - 2 data for building detection and not sure how to figure out appropriate zoom level. Thanks!
daniel-j-h commented 2 years ago

You can bring your own dataset; for multi-spectral data you will need something like https://github.com/mapbox/robosat/pull/138

Sentinel-2 resolution should be native at z14 if I'm not mistaken, so use that or lower.

Please read the note in the readme, though, this project is no longer maintained, developed, or in any other form active

https://github.com/mapbox/robosat/blob/cbb1c73328183afd2d6351b7bfa3f430b73103ea/README.md

manapshymyr-OB commented 2 years ago

You can bring your own dataset; for multi-spectral data you will need something like #138

Sentinel-2 resolution should be native at z14 if I'm not mistaken, so use that or lower.

Please read the note in the readme, though, this project is no longer maintained, developed, or in any other form active

https://github.com/mapbox/robosat/blob/cbb1c73328183afd2d6351b7bfa3f430b73103ea/README.md

@daniel-j-h Thanks a lot for your response. One more, question: I want to run the training only Bayern region and for this I would like to use multiple scenes of Sentinel 2 (with different dates of acquisition). However as far I understood, when tiling the scene we will get images with x, y named, for example: folder name with x coordinate, and inside this folder y coordinate images. How can I add multiple images (with time difference) for training dataset? I think I can not change the name of tiles, am I right?

daniel-j-h commented 2 years ago

For training I don't think we actually use the z/x/y tile coordinates, only for prediction and merging I think. If that's true, you could just work around by changing your tile z/x/y during training, e.g. add an offset or random ints to z/x/y, should work ™️

On August 10, 2021 3:24:14 AM UTC, manapshymyr-OB @.***> wrote:

You can bring your own dataset; for multi-spectral data you will need something like #138

Sentinel-2 resolution should be native at z14 if I'm not mistaken, so use that or lower.

Please read the note in the readme, though, this project is no longer maintained, developed, or in any other form active

https://github.com/mapbox/robosat/blob/cbb1c73328183afd2d6351b7bfa3f430b73103ea/README.md

@.*** Thanks a lot for your response. One more, question: I want to run the training only Bayern region and for this I would like to use multiple scenes of Sentinel 2 (with different dates of acquisition). However as far I understood, when tiling the scene we will get images with x, y named, for example: folder name with x coordinate, and inside this folder y coordinate images. How can I add multiple images (with time difference) for training dataset? I think I can not change the name of tiles, am I right?

manapshymyr-OB commented 2 years ago

@daniel-j-h if I use multiple images with different dates of the same scene (for example Bayern region with 2019 and 2020 data), will this change anything?

daniel-j-h commented 2 years ago

Should work, too! Ideally you should strive for a balanced dataset and also make sure to first shuffle your dataset, and then split into train / validate / test.

manapshymyr-OB commented 2 years ago

@daniel-j-h last question for now, can my data be multi-channel? because currently, I am converting tif images into PNG. Concerned if I really need this step or not...

manapshymyr-OB commented 2 years ago

@daniel-j-h By

Should work, too! Ideally you should strive for a balanced dataset and also make sure to first shuffle your dataset, and then split into train / validate / test.

What do you mean by balanced and shuffle?

daniel-j-h commented 2 years ago

For multi-spectral data you will need something like #138

By balanced I & shuffle I mean

try to have e.g. 50% from 2019 and 50% from 2020, then shuffle those, and train / validate / test on subsets of those.

Good luck!

manapshymyr-OB commented 2 years ago

For multi-spectral data you will need something like #138

By balanced I & shuffle I mean

* don't have one month from 2019 and one year from 2020

* don't train on 2019 and validate on 2020

try to have e.g. 50% from 2019 and 50% from 2020, then shuffle those, and train / validate / test on subsets of those.

Good luck! Thanks for your quick responses. I am a bit confused regarding the multichannel processing. Currently I am doing following steps: I have sentinel data and creating tiff image with B4, B3, B2, B8. Then, tiling this with 14 zoom level, but as a result I am getting .png tiles. Is these steps are correct?

Can you please give some introduction steps (i need to create dataset for robosat) for multichannel data?

daniel-j-h commented 2 years ago

Multi-channel is not supported, you will need pull request #138 or something similar code-wise. This project is unmaintained and we never got multi-channel support properly in.

On August 15, 2021 6:25:20 PM UTC, manapshymyr-OB @.***> wrote:

For multi-spectral data you will need something like #138

By balanced I & shuffle I mean

* don't have one month from 2019 and one year from 2020

* don't train on 2019 and validate on 2020

try to have e.g. 50% from 2019 and 50% from 2020, then shuffle those, and train / validate / test on subsets of those.

Good luck! Thanks for your quick responses. I am a bit confused regarding the multichannel processing. Currently I am doing following steps: I have sentinel data and creating tiff image with B4, B3, B2, B8. Then, tiling this with 14 zoom level, but as a result I am getting .png tiles. Is these steps are correct?

Can you please give some introduction steps (i need to create dataset for robosat) for multichannel data?

manapshymyr-OB commented 2 years ago

For multi-spectral data you will need something like #138

By balanced I & shuffle I mean

  • don't have one month from 2019 and one year from 2020
  • don't train on 2019 and validate on 2020

try to have e.g. 50% from 2019 and 50% from 2020, then shuffle those, and train / validate / test on subsets of those.

Good luck!

@daniel-j-h Hello again. Now I have 100 Sentinel scenes (50 for 2019 & 50 for 2020) as geotiff. Now I should tile them using gdal2tiles or rio tiler, right?

daniel-j-h commented 2 years ago

Yes.

Again this project is not maintained or supported anymore.

On August 19, 2021 3:34:07 AM UTC, manapshymyr-OB @.***> wrote:

For multi-spectral data you will need something like #138

By balanced I & shuffle I mean

  • don't have one month from 2019 and one year from 2020
  • don't train on 2019 and validate on 2020

try to have e.g. 50% from 2019 and 50% from 2020, then shuffle those, and train / validate / test on subsets of those.

Good luck!

@.*** Hello again. Now I have 100 Sentinel scenes (50 for 2019 & 50 for 2020) as geotiff. Now I should tile them using gdal2tiles or rio tiler, right?

manapshymyr-OB commented 2 years ago

@daniel-j-h Thanks for your reply. I know that this project is not supported anymore. Anyway, I want to try and see results. In the case of tiling GEOTif images, the result will be png images with the following structure z/x/y, where z - zoom, x, y tile numbers. Therefore rs subset this directory structure to filter. As far as I understood the rs subset will filter out tiles that are not included in the building.tiles And this is based on z/x/y.*, right? How should I process if I gave 4 repeating scenes (so the images will have repeating x/y/z)? How should I process dataset creation steps? Can you please give me some guide?

As you previously mentioned to offset y and x but then it will not be the same as tiles in the building.tiles....

For training I don't think we actually use the z/x/y tile coordinates, only for prediction and merging I think. If that's true, you could just work around by changing your tile z/x/y during training, e.g. add an offset or random ints to z/x/y, should work ™️ On August 10, 2021 3:24:14 AM UTC, manapshymyr-OB @.> wrote: > You can bring your own dataset; for multi-spectral data you will need something like #138 > > Sentinel-2 resolution should be native at z14 if I'm not mistaken, so use that or lower. > > Please read the note in the readme, though, this project is no longer maintained, developed, or in any other form active > > https://github.com/mapbox/robosat/blob/cbb1c73328183afd2d6351b7bfa3f430b73103ea/README.md @. Thanks a lot for your response. One more, question: I want to run the training only Bayern region and for this I would like to use multiple scenes of Sentinel 2 (with different dates of acquisition). However as far I understood, when tiling the scene we will get images with x, y named, for example: folder name with x coordinate, and inside this folder y coordinate images. How can I add multiple images (with time difference) for training dataset? I think I can not change the name of tiles, am I right?

daniel-j-h commented 2 years ago

@daniel-j-h Thanks for your reply. I know that this project is not supported anymore. Anyway, [.. wall of text here]

no-support