neo4j / docker-neo4j

Docker Images for the Neo4j Graph Database
Apache License 2.0
336 stars 171 forks source link

Neo4j 4.4.38-community docker image contains breaking changes #518

Closed Duppils closed 1 month ago

Duppils commented 1 month ago

I have confirmed that locking our version to 4.4.37 resolved the issue.

(written in pseudo code, you may need to make minor adjustments)

  1. Setup docker-compose:

docker-compose.yml:

dc-neo4j:
    image: neo4j:4.4.38-community
    container_name: dc-neo4j
    build:
      context: .
      environment:
          - NEO4J_AUTH=neo4j/${NEO4J_PASSWORD}
          - NEO4JLABS_PLUGINS=["apoc"]
      volumes:
          - ./neo4j/data:/data
       ports:
          - 7474:7474
          - 7687:7687

The variables used for neo4j:

- NEO4J_PASSWORD=xyz
- NEO4J_NAME_DB=neo4j
  1. Try to build and connect to the container
WARNING: Service runner-abcd-concurrent-0--gitlab.company.com__neo4j-production probably didn't start properly.

Service container logs:
stat: unrecognized option '-----BEGIN CERTIFICATE----- [Secrets] -----END CERTIFICATE-----'
does not exist or is not readable. Make sure you have correctly configured docker secrets.

Let me know if there is any more information I can provide to help find the issue. The code I'm working with isn't open source, so I wanted to avoid sharing too much.

Extra info: Problem happened running with Gitlab CI, haven't tested other ways. Running with gitlab-runner 17.1.0.

I assume it's related to secrets since it was mentioned in the logs and is also a part of the latest changelogs:

Docker
* Add support for docker secrets

https://github.com/neo4j/neo4j/wiki/Neo4j-4.4-changelog

stefgia commented 1 month ago

From the error message it looks like it somehow thinks that the certificate string is a path to a secret. Looking at your docker-compose.yaml, it seems you're building a modified image that also includes python:3.9-slim-bullseye?

Since I don't have access to your Dockerfile or the full yml file, my guess would be that you have an environment variable that has a _FILE suffix, which actually has the contents of a certificate file. If you have environment variables with the _FILE suffix, it will assume that the value is a secret file and will try to read the contents. If it is not a file it will give you that error you are getting and the container will exit. This follows the docker convention, I am just not sure how strict we are meant to be on this.

As a workaround to unblock you, I would suggest you suffix your variable with _CONTENTS and in the meantime I will look into wether we should be more relaxed about this and ignore such variables or not exit.

stefgia commented 1 month ago

Docker secret variable naming rules too strict

Duppils commented 1 month ago

it seems you're building a modified image that also includes python:3.9-slim-bullseye

I actually made a mistake, it is based on the maven:3-jdk-11-slim as a base image. I also failed to mention that we install APOC Core, neo4j-graph-data-science-2.0.3.jar, and custom procedures/search algorithms. None of these things have caused issues before, so I didn't think to mention them.

my guess would be that you have an environment variable that has a _FILE suffix, which actually has the contents of a certificate file.

I have checked the pipeline environment variables configured in Gitlab and in our .env files, but couldn't find any _FILE suffix. I don't have access to the machine running the pipeline, so it's still possible an ENV variable ending with _FILE is coming from somewhere else. I think it's a great guess, is there any other prefix/suffix that will be interpreted as a docker secret variable by default? Such as _KEY? If you have a link to a list of prefixes/suffixes, I'll be able to cross-reference it for any accidental matches.

Thank you for the quick reply.

stefgia commented 1 month ago

None of these things have caused issues before, so I didn't think to mention them.

Yes I don't think this would make a difference for this.

I have checked the pipeline environment variables configured in Gitlab and in our .env files, but couldn't find any _FILE suffix. I don't have access to the machine running the pipeline, so it's still possible an ENV variable ending with _FILE is coming from somewhere else.

I don't have experience with Gitlab CI, but if you have access to edit the configuration I imagine you should be able to printenv and read the output, which could give you a hint where and what to look for.

If you have a link to a list of prefixes/suffixes, I'll be able to cross-reference it for any accidental matches.

In our docker entrypoint we are only looking for _FILE and I think that's the only convention used in other images as well.

There's detailed information on how the docker secrets are configured in the Neo4j Documention and you can find more information in the docker documentaion.

Duppils commented 1 month ago

Gitlab has some predefined variables it adds to CI jobs so this seems to be the reason why we unintentionally end up with secrets to be interpreted. Still not sure why it shows the file content instead of the file path in the context of neo4j's docker secret implementation. We may be able to fix the issue on our end, but matching on all _FILE suffixes will probably lead to issues for more people in the future, but that's best for you to decide of course. I'll be sure to update the issue if I figure anything more out.

stefgia commented 1 month ago

matching on all _FILE suffixes will probably lead to issues for more people in the future, but that's best for you to decide of course

Yes I think limiting the feature to work only with neo4j env vars that start with NEO4J_ would make more sense and it should fix your issue.

Still not sure why it shows the file content instead of the file path in the context of neo4j's docker secret implementation.

I think while the variable is named "..._FILE" it actually has the file contents stored, while docker secrets would expect a file path.

We may be able to fix the issue on our end

The other thing to keep in mind is that in order for this variable to be passed in the docker entrypoint, you must be somehow adding it as an environment variable in the docker-compose.yml file, or the docker run command. Maybe the compose file is somehow modified by the gitlab CI or when it executes the docker run command it adds it?

john-dodson-h3 commented 1 month ago

This is an issue if you are using the Neo4J container in Argo Workflows as well. The container will immediately fail to run because it tries to load Docker secrets from any env var that ends in _FILE regardless of whether it has anything to do with Neo4J

stefgia commented 1 month ago

I have merged a fix for this, it should be available in an upcoming release :)