Closed andrewljohnson closed 8 years ago
...we provide a function where you could say "give me 40,000 random tiles from within these long,lat bounding boxes, and label them using these labeller functions I want to try", and that would be fast.
(A "labeller" function being something that takes a long,lat bounding box and returns some numpy array, with simple ones that return 1:1 arrays of one of the RGBI bands, or more complex ones like the has_center_road and its various permutations, or ones that map 64x64 to 4x4 binary has-road, or has-tennis, etc)
We could still cache it if we want, (as is being discussed in https://github.com/trailbehind/DeepOSM/issues/30 ), though I think we can ditch all the NAIP-specific details, and just save to NetCDF the arrays that are going straight into tensorflow, plus some metadata about the experiment if we want.
merging with other infrastructure issues
Putting the data in Postgres seems like a good mid-game/end-game move. Do this after we put up deeposm.org, want to scale, and/or want to provide a place for researchers to run arbitrary experiments.
Benefits include: