trailbehind / DeepOSM

Train a deep learning net with OpenStreetMap features and satellite imagery.
MIT License
1.31k stars 182 forks source link

automate/improve infrastructure #59

Open andrewljohnson opened 8 years ago

andrewljohnson commented 8 years ago

This issue describes how deeposm works now. Then it describes changes needed to improve the infrastructure. Also see notes/scripts on these issues: #8, #23, #30, and #39 (these issues were closed to merge with this issue, not completed).

Test Data and Training

anandthakker commented 8 years ago

use Overpass or other approach to getting OSM data, instead of hacking up PBF extracts with Osmium

@andrewljohnson have you considered OSM QA Tiles?

andrewljohnson commented 8 years ago

@anandthakker I'm probably wrong, but I decided it doesn't really help me to use "tiled" data, because my source imagery isn't projected to Mercator, or tiled into a TMS pyramid?

When I did my first run at this, I used Mapzen vector tiles, plus TMS imagery tiles (since they align). But when I switched to NAIPs, it seemed to make more sense (less code) to clip some line strings to the bounds of my NAIPs or arbitrary tiles.

So that's also why I think my end solution is a planet DB with an API, and maybe that API is Overpass, or something simple I just cook up for this use case?

anandthakker commented 8 years ago

Ah -- yeah, I see what you mean. I went the tiled route for skynet-data basically because between existing tiled satellite source (Mapbox Satellite, DG, etc.) and OSM QA tiles, I figured I could skip a lot of the work. Since you've already handled chopping up the NAIP images, I agree that using tiles provides less of a benefit... Although, I guess one upside to going the tiled route would be that maybe it would be easier, in the future, to swap in a different imagery source (without your having to host/maintain/process it as much). Might not be worth it if you don't see yourself going that route, though

brandongalbraith commented 7 years ago

Have you considered running the analysis at Digital Ocean and pushing the results into S3 from there? The compute at DO is significantly cheaper than at AWS.

andrewljohnson commented 7 years ago

@brandongalbraith The analysis runs at home on a Linux box I have.

also: I might try out some Google ML infrastructure, just got my invite yesterday.

brandongalbraith commented 7 years ago

@andrewljohnson

Sorry about that! I inferred from above:

Actual work includes: move the analysis to AWS, run on a cycle

that the analysis was planned to run in AWS. Happy to help break everything apart to scale it out, my day gig is devops/infrastructure.

andrewljohnson commented 7 years ago

@brandongalbraith i realized that after I answered too fast :)

I guess the right answer is I hadn't much thought about it.