hackoregon / civic-devops

Master collection point for issues, procedures, and code to manage the HackOregon Civic platform
MIT License
11 stars 4 forks source link

Select and document a Docker registry for centralizing Docker images #6

Closed MikeTheCanuck closed 6 years ago

MikeTheCanuck commented 6 years ago

@znmeb in issue #3 said:

The other artifact we will have and need to back up / manage is Docker images. The correct way to manage those is with a Docker registry. It would need to be private to Hack Oregon but that's something we can buy and it's not too hard to build if that's something the DevOps people want to do.

@znmeb, can you tell me more about what problem a Docker registry is solving for you or your team? I know how they work in general, and we're using them in our production-ization of last season's work - I'd like to understand the current problem you're trying to solve by publishing pre-baked images to a registry. Thanks!

znmeb commented 6 years ago

At the very least we have one image that is the official PostgreSQL image (obtained from the Docker Store, which is a registry!) augmented with the PostGIS packages. I'm assuming we'll also end up with a GeoDjango image, and probably an image to serve up whatever the front end needs that doesn't come from a CDN.

In general, images are a way to standardize software - make sure all the developers have the same setup, rather than having to spend time troubleshooting environments, versions, etc.

MikeTheCanuck commented 6 years ago

Thanks Ed. Happy you’ve found something that is working for the current local development needs. I’m not seeing specific needs on the cloud side for which Docker/containerisation of the data layer is an obvious solution. It sounds like an interesting experiment, and one particular implementation approach - I can’t say whether containerising the data layer is necessarily the best approach CI/CD to AWS.

In last year’s experience just containerising the API layer, we heard some significant pushback from developers that the download latency of grabbing the whole image didn’t present an advantage over just running the API layer directly.

So long as we focus on automating the schema creation/evolution and data import steps, I believe that we will make great strides in improving the developer’s experience this season.

znmeb commented 6 years ago

I wasn't saying put the data on images - just that we need a way to manage the images as images, either with an automated build (pull from GitHub) like Docker Hub, or push to a registry via Travis CI. Do we have that now? Do we have a way to track changes in the underlying base images?

znmeb commented 6 years ago

I have three images that are more or less stable in my opinion. There's no reason I can't post them to Docker Hub except that I don't want to deal with users outside of Hack Oregon. ;-)

znmeb commented 6 years ago

I have some sizing info on the current crop of Data Science Pet Containers images and it's not particularly good news:

REPOSITORY               TAG                 IMAGE ID            CREATED             SIZE
postgis                  latest              1a16dad00d02        20 hours ago        1.82GB
amazon                   latest              656af589d0f3        20 hours ago        1.2GB
rstats                   latest              74d8e0dce816        20 hours ago        1.83GB
jupyter                  latest              f0271f199922        21 hours ago        3.82GB

That is a total of 8.67 GB. I took a look at the Amazon container registry pricing and it looks like it's $0,09 per gigabyte download. So every time someone downloads the whole stack it costs 78 cents.

That's not a lot, but I can't predict how often people will be doing this. There's no reason we can't host these in free public repositories for open source projects except having to deal with users outside of Hack Oregon.