mapbox / robosat

Semantic segmentation on aerial and satellite imagery. Extracts features such as: buildings, parking lots, roads, water, clouds
MIT License
2.02k stars 383 forks source link

Tool to write slippy map directory to a csv file #79

Closed bkowshik closed 6 years ago

bkowshik commented 6 years ago

We have a slippy map directory with X items (1000 in this example) and we want a subset of Y items (5 in this example). There are a couple of additional options that are common to data workflows:

Workflow

$ find "images/" -type f | wc -l
1000

# Get a csv file for the slippy map directory.
$ ./rs csv "images/" "images.csv" --shuffle true --count 5

# Five items since we asked so with --count 5
$ cat images.csv
41483,95309,18
41484,95309,18
41505,94409,18
41487,94309,18
41504,94409,18

The logical next step here is to use rs subset tool to get a subset of images per csv file.

# Use csv to prepare a subset slippy map directory.
$ ./rs subset "images/" "images.csv" "subset/"

# Slippy map directory now has only the files in the csv.
$ find "subset/" -type f
subset/18/41505/94409.webp
subset/18/41484/95309.webp
subset/18/41504/94409.webp
subset/18/41487/94309.webp
subset/18/41483/95309.webp
daniel-j-h commented 6 years ago

Not sure we need a tool for this. How is this different than e.g. a quick bash snippet like the following

for z in /data/*; do
  for x in $z/*; do
    for y in $x/*; do
      echo "$(basename $x),$(basename $y),$(basename $z)" >> tiles.csv;
    done;
  done;
done
sort -R tiles.csv | head -n 5
bkowshik commented 6 years ago

The bash script does exactly what the Python tool in this PR is doing.

Converting a slippy-map directory to a csv file is quite a common operation, specially when doing data cleaning and multiple rounds of hard-negative mining. Keeping scripts (bash or python) that do this as part of the repository documents this missing piece. Also, continuing with the familiar ./rs interface could be easier for our users in comparison to context switching between ./rs and this bash script.

daniel-j-h commented 6 years ago

I'm just worried of adding even more sub-commands to our already extensive list of rs tools. I think we should use tools for substantial tasks but keep these small data transformations out of the standard set of rs tools.

bkowshik commented 6 years ago

keep these small data transformations out of the standard set of rs tools.

Would a scripts folder inside robosat be a good place to put these one-off data-transformation related scripts? Any other ideas?

bkowshik commented 6 years ago

The bash script does exactly what the Python tool in this PR is doing.

Had not tried the bash script. Looks like there is an extra .png in all the lines with bash

41351,95200.png,18
41352,95200.png,18
41353,95200.png,18
41354,95200.png,18
41374,95250.png,18
daniel-j-h commented 6 years ago

You can use this intuitive bash syntax to split off the filename from the filename.extension pair: ${y%.*}

bkowshik commented 6 years ago

Thank you for the bash snippet @daniel-j-h

for z in /data/*; do
  for x in $z/*; do
    for y in $x/*; do
      echo "$(basename $x),$(basename ${y%.*}),$(basename $z)" >> tiles.csv;
    done;
  done;
done

Closing here!