Closed bkowshik closed 6 years ago
Not sure we need a tool for this. How is this different than e.g. a quick bash snippet like the following
for z in /data/*; do
for x in $z/*; do
for y in $x/*; do
echo "$(basename $x),$(basename $y),$(basename $z)" >> tiles.csv;
done;
done;
done
sort -R tiles.csv | head -n 5
The bash script does exactly what the Python tool in this PR is doing.
Converting a slippy-map directory to a csv file is quite a common operation, specially when doing data cleaning and multiple rounds of hard-negative mining. Keeping scripts (bash or python) that do this as part of the repository documents this missing piece. Also, continuing with the familiar ./rs
interface could be easier for our users in comparison to context switching between ./rs
and this bash script.
I'm just worried of adding even more sub-commands to our already extensive list of rs tools. I think we should use tools for substantial tasks but keep these small data transformations out of the standard set of rs tools.
keep these small data transformations out of the standard set of rs tools.
Would a scripts
folder inside robosat
be a good place to put these one-off data-transformation related scripts? Any other ideas?
robosat/scripts/csv.py
The bash script does exactly what the Python tool in this PR is doing.
Had not tried the bash
script. Looks like there is an extra .png
in all the lines with bash
41351,95200.png,18
41352,95200.png,18
41353,95200.png,18
41354,95200.png,18
41374,95250.png,18
You can use this intuitive bash syntax to split off the filename from the filename.extension
pair: ${y%.*}
Thank you for the bash snippet @daniel-j-h
for z in /data/*; do
for x in $z/*; do
for y in $x/*; do
echo "$(basename $x),$(basename ${y%.*}),$(basename $z)" >> tiles.csv;
done;
done;
done
Closing here!
We have a slippy map directory with X items (1000 in this example) and we want a subset of Y items (5 in this example). There are a couple of additional options that are common to data workflows:
--shuffle
--count
Workflow
The logical next step here is to use
rs subset
tool to get a subset of images per csv file.