asitemade4u commented 2 years ago

Hi Ellen,

My goal is to manage several cities using Headway, notably NYC and Paris.
Of course I have tried and succeeded in installing several different "stacks" in different folders.
Now I would like to share the "application" part of each stack and have only the data specific to each city.
As we have here a local docker Registry, I decided I would rewrite the docker-compose.yml file so that each stack fetches a common image from the Registry instead of fetching it from the local folder -- which is the way docker stacks are supposed to work.

It did not work because Headway saved dependencies within the containers instead of externalizing them as data. So here is what happens:

I saved each container in NewYork as an image to our Registry
I then stopped the NYC stack
I docker-pruned the system on the server to be sure the slate would be cleaned for the test
I modified the docker-compose.yml file of Philadelphia so that the image underlying each container is fetched from our Registry
then launched it using docker-compose up -d as usual
Photon failed because it could not find the /data/Philadelphia.photon.tar.bz2 file, and for a good reason: the image had been created for NewYork
As a result Headway did not work at all.

Here is the trace from the console:

Extracting photon index
+ echo 'Extracting photon index'
+ cd /
+ pbzip2 -d
+ cat /data/Philadelphia.photon.tar.bz2
+ tar x
cat: /data/Philadelphia.photon.tar.bz2: No such file or directory
pbzip2: producer_decompress: *ERROR: when reading bzip2 input stream
Terminator thread: premature exit requested - quitting...

Here is a simple fix: load the file from a mounted volume instead, eg. from the /data folder mounted local volume.

asitemade4u commented 2 years ago

I checked and the exact same shortcoming happened with the Postgres database within the Nominatim container:

CREATE DATABASE
+ echo 'Beginning nominatim restore'
Beginning nominatim restore
+ cat /data/Philadelphia.nominatim.sql.bz2
+ pbzip2 -d
cat: /data/Philadelphia.nominatim.sql.bz2: No such file or directory
+ sudo -E -u postgres psql nominatim
pbzip2: producer_decompress: *ERROR: when reading bzip2 input stream

asitemade4u commented 2 years ago

And on a more general page, why do you build your own images instead of relying on official images from each app developer? If it is to harmonize versions, docker provides a very handy way to achieve that: tags.

ellenhp commented 2 years ago

Here is a simple fix: load the file from a mounted volume instead, eg. from the /data folder mounted local volume.

edit: assuming you mean as a docker volume, not a directory mount as we currently do it

I don't want to do that for the docker-compose system because it makes everything more complicated (how/when do you update data in the volume?) When I work on Headway I run dozens of builds a day and if there are additional steps involved I will lose my marbles. :)

edit: directory mounts good though, I like directory mounts (though I'll need to figure something else out once I build the k8s config)

81 will help with this a lot because it paves the way for doing cool things with kubernetes. Quite frankly the system as it exists in `main` was not designed (even slightly) for production use.

And on a more general page, why do you build your own images instead of relying on official images from each app developer?

I'd love help reducing the reliance on custom build steps.

ellenhp commented 2 years ago

I hope to have working kubernetes configs in the next week or so, assuming I can get #81 working with full-planet imports. I have hardware coming in on Sunday that should enable my desktop to perform a full-planet import (before I was using a borrowed server) so once I have that all set up I think I'll be able to iterate much more quickly.

ellenhp commented 2 years ago

It did not work because Headway saved dependencies within the containers instead of externalizing them as data.

Where do we do this? I don't think Headway does this anymore. That's how I originally designed it before 3nprob got involved and convince me it was a bad idea. We do absolutely mount the ./data directory as a volume though.

edit: Just realized you probably ran docker images and saw all the headway build images. Those are mostly used as a build cache. The images that serve traffic should all be generic (configured by the directory mount and the .env file)

asitemade4u commented 2 years ago

So I am trying NYC again with the very last version.
As per your question about directory mounts, Docker is great at synchronizing inner and outer volumes -- whatever you write in the inner is synced with the outer and vice versa.
I am not too enthused with yur switch to Kubernetes which I do not use and do not want to deploy, for lack of time and knowledge. Please consider improving the existing before jumping to yet another platform...

ellenhp commented 2 years ago

I am not too enthused with yur switch to Kubernetes which I do not use and do not want to deploy, for lack of time and knowledge. Please consider improving the existing before jumping to yet another platform...

I'm not switching to k8s, I'm adding it as an option and focusing productionization efforts there because docker-compose isn't designed for production use afaik.

As per your question about directory mounts, Docker is great at synchronizing inner and outer volumes -- whatever you write in the inner is synced with the outer and vice versa.

We do use directory mounts in Headway, check the docker-compose.yaml for how.

So I am trying NYC again with the very last version.

It should work. The Makefile and docker-compose.yaml files are really useful resources for the time being because until I get actual production documentation written the code is all we have. Don't push the NewYork or Paris, etc containers to your registry. They're just there for build caching purposes until we switch to something that handles build caches in a reasonable way. You'll want to push the images to your registry that exist verbatim in the docker-compose.yaml file, then create a new docker-compose.yaml that pulls from the registry. Copy that docker-compose.yaml to a new directory for each metro area you want to host and configure it appropriately with a .env, adjust port numbers etc. Then symlink the ./data directory for each of those new metro area directories into your main headway data directory so that each metro area's volume mount works correctly.

That should(?) let you set up multiple headway instances side-by-side. But I really do think kubernetes is going to make this much easier, and it will allow zero-downtime deploys and high availability among other things.

asitemade4u commented 2 years ago

Let's agree to disagree on docker-compose not being fit for production ;o))
What do you mean by "the images to your registry that exist verbatim in the docker-compose.yaml file"? Of course, I have pushed each container separately using docker commit then:

docker tag hwy g10.qtpl.net:5000/hwy
docker push g10.qtpl.net:5000/hwy
docker tag hwy-ngx g10.qtpl.net:5000/hwy-ngx
docker push g10.qtpl.net:5000/hwy-ngx
docker tag hwy-otp g10.qtpl.net:5000/hwy-otp
docker push g10.qtpl.net:5000/hwy-otp
docker tag hwy-vlh g10.qtpl.net:5000/hwy-vlh
docker push g10.qtpl.net:5000/hwy-vlh
docker tag hwy-nmn g10.qtpl.net:5000/hwy-nmn
docker push g10.qtpl.net:5000/hwy-nmn
docker tag hwy-phn g10.qtpl.net:5000/hwy-phn
docker push g10.qtpl.net:5000/hwy-phn

Note that I have given shorter names to each container. Then I replaced the image call for each container in the docker-compose.yml file by a call to our Registry.

ellenhp commented 2 years ago

Yeah, that should work fine. The main thing is that you grab the images designed to serve traffic because the current goofy docker-based preprocessing steps create a lot of images that do literally nothing.

Let's agree to disagree on docker-compose not being fit for production ;o))

It's what I'm using right now for https://maps.ellenhp.me but I'm not the biggest fan because botched deploys tend to cause problems. The restart: always trick did help a lot when I learned about it though. It's fine for most things, I tend to just be a perfectionist.

If you do come up with some good docker-compose changes that help for easy production-grade deploys I'd love to include them in headway btw, as long as they don't interfere with quick iteration during development. I'm trying to make this project as batteries-included as possible.

asitemade4u commented 2 years ago

Here are some ideas for a better deployment experience:

I have the feeling you want the data to be the most up to date, hence the length -- and therefore possible failure -- of the build
But why (even if you are a perfectionist) be so radical and not save an intermediary, slightly outdated version of each set of data and allow for their ulterior update?
That approach would also liberate you from the building from scratch of each stack image as you would rely on official, pre-existing images -- besides, those are often available on much lighter environments such as Alpine.
And if the image does not exist already, just create a DockerHub account and add your image there.

So:

a makefile (which is currently a weird offspring of a Dockerfile) to import and expand the data
an .env file for all specific settings
a docker-compose.yml file relying on existing images

For us, this would also have the advantage of being very much how Portainer works, which would allow us to version control the docker-compose.yml file...

ellenhp commented 2 years ago

That's one of the conditions that Wolfram Schneider gave me in exchange for his blessing for using the BBBike extracts in an automated way in a published project (the other being that I set the user-agent string in the request so he can monitor and potentially throttle us if headway got too big)

I might eventually mirror a lot of this data which would let people run Headway instances without even needing to pre-process it, but that would be a lot of work and I'm trying to focus on the basics first.

asitemade4u commented 2 years ago

Ah I get it. But you could use another .pbf file if that is too much of a constraint.

ellenhp commented 2 years ago

Correct, headway now supports custom extracts so this is no longer a limiting factor. You can also just rm ./data/CityName.osm.pbf which will trigger a re-download.

asitemade4u commented 2 years ago

Yes because there is another catch on how you deploy Headway now: there is no way (eg. using WatchTower) each stack may be updated automatically and individually. One has to:

delete or inactivate the existing full stack (eg. by renaming it)
download the new source code
relaunch a make

So basically, what you earn in automaticity (or a semblance of as the build is often interrupted -- experiencing it right now), you lose in granularity.

ellenhp commented 2 years ago

Kubernetes will allow zero-downtime deploys and high-availability, and is what I plan on focusing productionization efforts on. There are a lot of different things mentioned in this issue and I'm not really sure what it's tracking anymore, but the objections are noted. I'm just not convinced that there are many good reasons not to use kubernetes. Almost the entire devops industry has standardized on it as the orchestration system of choice, as far as I know.

I'm going to close this because it's really broad and I'm not sure what it's tracking specifically, but I did spin off #95 which is absolutely a thing I want to try and do. The rest of the things reported in this issue should already be addressed or are not currently in scope.

headwaymaps / headway

Make the stack more generic #82

81 will help with this a lot because it paves the way for doing cool things with kubernetes. Quite frankly the system as it exists in `main` was not designed (even slightly) for production use.

headwaymaps / headway

Make the stack more generic #82

81 will help with this a lot because it paves the way for doing cool things with kubernetes. Quite frankly the system as it exists in main was not designed (even slightly) for production use.

81 will help with this a lot because it paves the way for doing cool things with kubernetes. Quite frankly the system as it exists in `main` was not designed (even slightly) for production use.