hotosm / oam-dynamic-tiler

Dynamic tiling of raster data for OpenAerialMap + others
Other
35 stars 12 forks source link

Planning next phase of development #43

Open smit1678 opened 7 years ago

smit1678 commented 7 years ago

@mojodna @tombh Let's plan out the next phase of development. I think we have lessons learned/work completed from other work (like marblecutter) but we still need to finalize the direction for further improvements and optimizations.

From previous discussions and challenges in the past, it seems like a good direction is to separate some of the functions: separate processing (move towards COG) and metadata generation from the tile serving app. Then there are other features and ideas to add as improvements.

Can we set a good direction and create a roadmap?

tombh commented 7 years ago

My main motivation for this refactor is to simplify debugging and development of OAM. For example, currently, testing the upload process in OAM requires having a running Docker instance.

So I would like to see the dynamic tiler split into 2 distinct parts:

  1. A processing client that converts any Geotiff into a Cloud Optimized Geotiff. Theoretically this should be possible to write in pure BASH, though it might be nice to wrap that in something more friendly for dealing with all the command line options. This should have its own test suite. What's more this tool will have a much wider audience than just the dynamic tiler, assuming the COG revolution is a thing.
  2. The dynamic tiler is a more complicated beast, but essentially could it just be a blackbox that accepted an S3 account setup through ENV credentials and simply returned a TMS endpoint for any S3 URL passed to it through its endpoint, eg; bucket-name.dynamictiler.com/154766456/cog.gtiff/x/y/z? Again such a blackbox would have broader relevance to the GIS community than just OAM and would subsequently benefit from the wider adoption.

The final piece of the puzzle is the OAM-specific stuff like generating meta data, thumbnails etc. These should explicitly be the responsibility of OAM, specifically the oam-catalog (formerly the oam-uploader-api).

mojodna commented 7 years ago

1) This is largely done as process.sh / transcode.sh. Shipping it as a Docker container means that appropriate versions of GDAL, etc. are available. I haven't checked RGB(A) sources recently, but recent updates should produce conforming COGs.

2) With recent changes that are part of marblecutter, it's no longer tied to S3 (it can work with local + remote files). Single-file S3 support needs to be added back as part of the mechanism introduced in https://github.com/mojodna/marblecutter/pull/9.

Agreed that metadata generation should be separated out from transcoding. Ideally, outputs from transcoding will only be .tif, .tif.ovr (if necessary), _footprint.json (including resolution in meters + other core metadata).

mojodna commented 7 years ago

OAM metadata generation can occur by post-processing the bucket, so maybe there's a need for a webhook endpoint after each image is processed. (@iandees has currently wired in insertion of footprints at run-time to avoid needing to post-process, so I think a web hook (and corresponding Heroku app / Lambda function) would fit that well.)

tombh commented 7 years ago

Oh you might have missed my main idea there. Is this possible:

The Lambda/Heroku server receives this request https://dynamictiler.com/www.remote.com/COG.tiff/12/13/2. Internally it parses out https://www.remote.com/COG.tiff and proxies a gdal vsicurl request to get the tile at 12/13/2 and returns that as the response to the original https://dynamictiler.com/www.remote.com/COG.tiff/12/13/2 request?

If so then the dynamic tiler can be completely separated from the processing/transcoding step. Then there is no need for post-processing hooks. I'm imagining that the metadata will be generated on the CLI after, let's call it cog_creator.sh, exits with success, there's no need to have any communication at all with the dynamic tiler during processing.

mojodna commented 7 years ago

One call takeaway - split transcoding from tiling (as separate repos) since they already don't depend on one another.

mojodna commented 7 years ago

@tombh: that describes the current S3 workflow. the post-processing steps are to facilitate mosaicking and browsing.

mojodna commented 7 years ago

Tools:

Components:

tombh commented 7 years ago

I just want to make sure we don't lose sight of the main goal, which is to decouple OAM from the dynamic tiler, I think that should be our primary goal and any mosaicking should be a bonus. So basically you can leave any of the indexing and metadata generation to me, we'll just rip that stuff out of the tiler. I can probably help with the transcoding tool. So most of the work for you will be making the tiler accept these externally processed COGs, like https://dynamictiler.com/www.remote.com/COG.tiff/12/13/2.

mojodna commented 7 years ago

Yup, that should be pretty straightforward. It's basically how the tiler in this repo works: metadata + files get generated somehow (process.sh, which can/should be its own repo), they get put on S3 (for now) with an identifier of some sort. The tiler then translates identifiers in the URL into remote sources (using S3_BUCKET and S3_PREFIX), reads the metadata, and proceeds to tile them.

Ideas for what (but especially how) to test the transcoder would be super helpful. I haven't seen good examples of tests for bash scripts.

tombh commented 7 years ago

Exactly. Great so we're on the same page.

I've always thought Dokku has an impressive example of testing in BASH, it looks very sane. However, I don't really see why tests have to be written in BASH (other than saving on installing yet more dependencies). It's reasonable to just have tests that wrap the CLI tool, so they could be written in any language, I suspect Python would be good seeing as that is a dependency of GDAL.