mapbox / robosat

Semantic segmentation on aerial and satellite imagery. Extracts features such as: buildings, parking lots, roads, water, clouds
MIT License
2.02k stars 383 forks source link

Failture to download a tile, rs train won't work. #125

Closed amandasaurus closed 6 years ago

amandasaurus commented 6 years ago

I recently tried robosat again, and during the rs download phase, ~115 tiles "failed", inserting print statements tells me that the URL https://api.mapbox.com/v4/mapbox.satellite/18/126553/85280.png?access_token=XXX 404's. However this is a tile, with a building https://tile.openstreetmap.org/18/126553/85280.png. I get a 404 if I use png or webp URL, and with and without @2x suffix. This appears to be some problem with mapbox satellite tiles.

rs mask will produce a mask for that tile (with the correct building) since it's using the geojson.

But then rs train doesn't work because the number of images doesn't match the number of labels:

./rs train --model ./config/model-unet.toml --dataset ./config/dataset-buildings.toml
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/rory/osm/robosat/robosat/tools/__main__.py", line 60, in <module>
    args.func(args)
  File "/home/rory/osm/robosat/robosat/tools/train.py", line 111, in main
    train_loader, val_loader = get_dataset_loaders(model, dataset, args.workers)
  File "/home/rory/osm/robosat/robosat/tools/train.py", line 271, in get_dataset_loaders
    [os.path.join(path, "training", "images")], os.path.join(path, "training", "labels"), transform
  File "/home/rory/osm/robosat/robosat/datasets.py", line 58, in __init__
    assert len(self.target) == len(self.inputs[0]), "same number of tiles in images and label"
AssertionError: same number of tiles in images and label

And it's not wrong:

$ find ./ie-buildings/dataset/ -type f | cut -d/ -f-5 | uniq -c
  18596 ./ie-buildings/dataset/validation/images
  18610 ./ie-buildings/dataset/validation/labels
 148786 ./ie-buildings/dataset/training/images
 148880 ./ie-buildings/dataset/training/labels
  18608 ./ie-buildings/dataset/evaluation/images
  18615 ./ie-buildings/dataset/evaluation/labels

How can I fix this? If mapbox satellite tiles all worked, that would work. What if rs train only looked at matching images/labels (ie. ignore labels that don't have an image?) Or I could remove labels that lack a matching image.

daniel-j-h commented 6 years ago

You are correct - it looks like the tile does not exist.

The reason why we don't add magic image/mask filtering to the dataset is we want users to be explicit about the dataset they created. Some users will only use the rs tools to create a dataset, other users will bring their own (e.g. cloud tiles and masks, rasterized traces, etc.).

The solution to your problem is to make sure your images and masks are in sync. For example loop through z, x, x in a bash loop and clean up the samples where either image or mask is missing.

Not sure if we should have a tool for that? There's a similar ticket in https://github.com/mapbox/robosat/issues/93.

amandasaurus commented 6 years ago

Thanks, that makes sense. I've written this little shell script which removes tiles and masks which don't have a corresponding item. This make everything work:

#! /bin/bash
# Makes the masks and the tiles be the same. Removes any masks which don't have
# a corresponding image (which can happen if the tile failed to be downloaded),
# and vice versa

set -o errexit
set -o nounset

IMAGES=$(realpath $1)
MASKS=$(realpath $2)

find ${MASKS} -type f -printf "%P\n" | cut -d. -f1 | while read FILE ;  do
    if [[ ! -f "${IMAGES}/${FILE}.webp" ]] ; then
        echo "${IMAGES}/$FILE.webp not exists"
        rm "${MASKS}/${FILE}.png"
    fi
done

find ${IMAGES} -type f -printf "%P\n" | cut -d. -f1 | while read FILE ;  do
    if [[ ! -f "${MASKS}/${FILE}.png" ]] ; then
        echo "${MASKS}/$FILE.png not exists"
        rm "${IMAGES}/${FILE}.webp"
    fi
done

It sounds like rs train is working as expected. A tile 404'ing from mapbox satelite tileserver isn't an appropriate issue for this project, so I'll close this issue.