Simplified initial version of Training Data STAC

lossyrob commented 6 years ago

I had a fruitful discussion with @dlindenbaum that I want to record here as a suggestion and use case description as a starting point for a minimal implementation of TD STAC.

Raster Vision can consume labeled data that has the following items:

Image(s) of the scene
GeoJSON
An optional AOI polygon or set of polygons that describe the area of the image that is fully labeled.

For the "Image(s) of a scene" part, it's good to have the image size scoped such that downloading and loading up the corresponding GeoJSON in QGIS with the Raster Vision Plugin won't put an end to my machine.

Currently the Rio dataset in SpaceNet is set up where there are a set of images, one large label GeoJSON, and a total AOI. This notebook splits it up labels to fit with the above scheme.

Other cities in SpaceNet have COGs, large GeoJSON in a tarball, and a tar of the images chipped out to various smaller sizes with corresponding label GeoJSONs. For working with those datasets, we'll have to make similar preprocessing, sometimes requiring the user to download all the imagery to get at the labels - something we'd like to avoid.

Dave talked about STAC-ifying all of SpaceNet, and TD-STAC-ifying it as well. He also mentioned that a good first step is just un-tarring some of those files and exposing the files in a way that they could be directly read off of S3 and not require bulk downloaded. I mentioned that, because all I really need is that (Image(s), GeoJSON, Optional[AOI]) triplet, that triplet is really all I would want for now out of a TD STAC of SpaceNet, or anything else for that matter.

I'd like to propose we figure out a simplified version of the TD STAC that just tries to get us to that point - not necessarily containing everything in the table currently in the README, but just getting to an indexable set of labeled data that people putting training data out there can aim at, and consumers like Raster Vision can utilize.

This issue can serve as a place for discussing ideas about this "TD STAC 0.0.0.1" implementation before making the PR's to add info about it to the repository.

HamedAlemo commented 6 years ago

Thanks Rob. This is a great starting point. I agree on getting a very simple version out to start testing it. The optional[AOI] is a good idea too.

The elements that we listed in the table in the README should be easy to include in the first version, right? They are very similar to the STAC ones, and if the image is STAC it should be easy to generate all of these.

One thing about the GeoJSON labels, if one generates labels using segmentation, my preference is that they should be able to include that as a raster file not necessarily GeoJSON. Does that makes sense?

lossyrob commented 6 years ago

Hey Hamed - I think that all makes a lot of sense re: the Raster labels and the elements of the TD STAC. For my own use case, to start with, I'd just need GeoJSON and the class name property to look for in the GeoJSON. As far as implementing other fields, I'd say that'd be up to @dlindenbaum if he's going to be the first to take a crack at create a TD 0.0.1 out of SpaceNet.

radiantearth / training-data-stac-spec

Simplified initial version of Training Data STAC #2