Closed langbein-daniel closed 1 year ago
Regarding the workaround: According to here, the long syntax for "depends_on" was removed with docker compose v3 and reintroduced with docker compose v3.9. So it is probably too early to use this in all examples. But I could test if a simpledepends_on: elasticsearch
at the API service is enough.
This is related to the auto_discover
flag, the pelias/api
instance(s) will query elasticsearch upon startup for a list of sources/layers, so that any custom data can be detected.
In your case there is a race condition, once auto-detect has been run once the result is cached so that features such as negative filters can be performant (ie. all layers except one)
It doesn't indicate that the data is missing from the index, just that the warmup wasn't complete at the time it was queried.
I don't recall the specifics, but you should be able to search for it in the pelias/api
codebase, there is a config flag to disable it.
https://github.com/pelias/config/pull/127
Now all that being said, the pelias elastic wait
command exists for this purpose, it blocks until elastic is running and ready to accept traffic.
IIRC docker compose doesn't have a HTTP 'healthcheck' feature so I'm not sure if it can be solved in a way that docker compose up
'just-works'.
Simply checking that the process spun up and is running isn't sufficient to determine that it can accept traffic, so be careful that depends_on
does what you expect it to.
Oh sorry I just read your comment again and apparently it is possible now, so that's cool.
One of the difficulties of maintaining a ~9yo project used by lots of different systems is maintaining backwards compatibility.
Can we improve this without breaking things for existing users?
I guess that question mainly asks whether we can stay on compose format v2 and whether depends_on
and healthcheck
are available for all versions greater than the min versions we publish on the readme.
This is related to the
auto_discover
flag, thepelias/api
instance(s) will query elasticsearch upon startup for a list of sources/layers, so that any custom data can be detected.In your case there is a race condition, once auto-detect has been run once the result is cached so that features such as negative filters can be performant (ie. all layers except one)
It doesn't indicate that the data is missing from the index, just that the warmup wasn't complete at the time it was queried.
I don't recall the specifics, but you should be able to search for it in the
pelias/api
codebase, there is a config flag to disable it.Now all that being said, the
pelias elastic wait
command exists for this purpose, it blocks until elastic is running and ready to accept traffic.IIRC docker compose doesn't have a HTTP 'healthcheck' feature so I'm not sure if it can be solved in a way that
docker compose up
'just-works'.Simply checking that the process spun up and is running isn't sufficient to determine that it can accept traffic, so be careful that
depends_on
does what you expect it to.
Thanks for the detailed information. The pelias
CLI is definitely helpful for import and startup of Pelias services.
But there are still two things that come to mind:
1)
In the README of this project, there is a section https://github.com/pelias/docker#quickstart-build-script showing the commands to build and start it. When I first read it, I thought that the pelias elastic wait
is just needed for the subsequent pelias import all
(as this executes scripts in Docker containers that import into the DB). But if one stops all services (e.g. during server restart) and then starts them with pelias compose up
, then the same race problem occurs.
So I think that it would be good to either add a short section to the README on how to start the project again (without reimporting data) with pelias elastic start && pelias elastic wait && pelias compose up
or to adjust the pelias compose up
-command: This command could check if Elasticsearch is already running and if not start it with pelias elastic start && pelias elastic wait
before running docker-compose up
.
2)
I was thinking about a setup using Docker "data-images". On a build machine, one runs the download/prepare/import steps and then creates "data-images" for each service of Pelias containing just the data required during normal operation. These can be uploaded to a container registry. On the production machine only a docker-compose installation and a docker-compose.yml
file are required. The compose file is similar to those of the examples but shorter as only the services used during normal operation are included (not those just for download/prepare/import/DB-setup) and the image
tags are replaced with those from the data-images.
This way a simple docker compose up --pull always
on the production server is enough to pull the latest data-images and restart the services.
... If there wasn't the race problem. A simple solution to this is to restart the API service after some seconds. So it's not that big a problem. But having the healthcheck
for Elasticsearch and the depends_on elasticsearch
at the API service sounds cleaner to me.
Do you think that an example that builds Docker "data-images" (after running usual pelias
CLI commands) with an additional compose file for deployment (e.g. on a different) server would be a good addition to this project? I will try to implement this for a small region and would be happy to share.
I guess that question mainly asks whether we can stay on compose format v2 and whether depends_on and healthcheck are available for all versions greater than the min versions we publish on the readme.
I will have a look into this.
Do you think that an example that builds Docker "data-images" (after running usual
pelias
CLI commands) with an additional compose file for deployment (e.g. on a different) server would be a good addition to this project?
It sounds like a good idea, but I fear that this would encourage developers to think that running docker compose in a production environment is a good idea. This workflow is simply provided for development.
For a production deployment I would recommend kubernetes, we have the scripts available in another repo, and for elasticsearch snapshots people's preferences differ, but we use the s3 elasticsearch plugin to save/load to Amazon s3.
I'm also not sure how well suited a container registry is for storing very large data volumes, I guess people do this but I don't have any first-hand experience with it, so can't comment on how good/bad an idea that is in practice.
@langbein-daniel you might be interested in the api.targets.auto_discover_required
config option added in https://github.com/pelias/api/pull/1624.
For backwards compatibility, it defaults to disabled. However, when turned on the API will quit if it can't connect to Elasticsearch and fetch the type mapping on startup. Because docker-compose generally restarts services that quit, eventually Elasticsearch will come up, and then the API will successfully load the type mapping, and all will be well.
Thank you @missinglink for the further feedback. I agree that Kubernetes is better for production than a docker-compose file. But by creating data containers one gets closer to that setup without the requirement of a Kubernetes cluster. So I'll still give that setup a try ;)
Since the examples explain a Pelias setup in a practical way, it might be better to restrict them to the western thing as much as possible. But the offer to share my setup as a separate example still exists.
And thanks @orangejulius for pointing me to that config option. By enabling this a pelias compose up
brought everything up and running. (The API service restarted about 4 times until Elasticsearch was up, but it works!)
I'll close this as completed as multiple ways to solve it have been discussed. Many thanks again for the help!
Speaks anything against setting api.targets.auto_discover_required
to true in the examples that do already use a recent version of the API service? The versions from e.g. the Germany example are already recent enough.
Or what do you think about adding some notes about the startup to the README (pelias elastic start && pelias elastic wait && pelias compose up
)?
Describe the bug
I have created a project based on the Germany Docker Compose project.
The error occurs when starting the services or is related to it.
If I starts the project with
docker compose up
, all services start at the same time and e.g. the API container might be running before the Elasticsearch container. When I then querying the Pelias API, not all (previously imported) data is there: In my example, two layers are missing - no matter how long I keep the services running.If I do first start Elasticsearch and then a
docker compose up
, then all data is there.I do not know where the error comes from. It could be the API container that when it starts before Elasticsearch doesn't see all the layers and assumes that just some default layers exist.
Steps to Reproduce
This should give a more detailed explanation of the discovered bug.
I have created a git repository containing the modified
pelias.json
, adocker-compose.yml
with fewer services (to narrow the problem down) and a README file. Please view the full README to see the exact steps to reproduce the error: https://github.com/langbein-daniel/pelias-docker-bug.The README contains a bash script that imports the data (GTFS feed into two custom layers
stop
andstation
) and shows that by starting first Elasticsearch and thendocker compose up
, everything works:And it shows that by starting all services at once, the custom layers are reported missing:
Expected behavior
If the problem is that the API server has started too early, it should wait until Elasticsearch is up and then properly detect all layers.
Environment (please complete the following information):
Linux yodaTux 6.2.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Sun, 26 Feb 2023 03:39:23 +0000 x86_64 GNU/Linux
Docker version 23.0.1, build a5ee5b1dfc
Docker Compose version 2.16.0
Workaround
As a workaround, one can add this to the Elasticsearch service:
and this to the API service:
With these two changes, one can start the project with just
docker compose up
and all data/layers are detected.If you like the two changes, I'm happy to create a pull request and add this to the sample projects.
But maybe it is better to find out if it really is the API service that causes the problem and maybe fix its startup procedure to wait for Elasticsearch.