stain / jena-docker

Docker image for Apache Jena riot
Apache License 2.0
99 stars 86 forks source link

File locking error on externally bound volume #41

Open daveatcit opened 4 years ago

daveatcit commented 4 years ago

I am using a bound volume on the Docker host to externalize the state of the TDB2 database, so I can destroy the jena-fuseki container, and recreate it, but still use the original datasets/models. Sometimes I encountered an intermittent problem with a TDBException on successive restarts. It said that the current server PID was not the same as the PID that locked the database, and so the server stopped. I think this is a safety feature build into TDB2 to stop multi-process updates.

Just above the fatal exception there was a warning, that it could not execute the "PS" command, and this appear to be associated with the lock checking process. I tried issuing the PS command from inside the container's BASH shell, but it did not recognize it.

I added a fix to the Dockerfile so that the "procps" package was installed:

change: bash curl ca-certificates findutils coreutils pwgen \

to: bash curl ca-certificates findutils coreutils pwgen procps \

and this appeared to fix the problem, but I need to do more testing.

kvjrhall commented 3 years ago

I can confirm that this continues to happen on stain/jena-fuseki:latest when deployed in kubernetes. Presently, the hash for said image is 4d84eb09dc69603cab990a25a5ad683cec648159f39ccf5dbffba312e7d7666a and it has not been updated in approximately a year. I have not tested to see if this event happens in stain/jena-fuseki:4.0.0, as I've already implemented a workaround.

The correct fix (installing procps in the dockerfile) appears to have already been mered, which I imaging should have also resulted in closing this issue.

This issue will continue to catch people, however, until stain/jena-fuseki:latest is brought up to date with a more recent tag.

For others caught by this, try stain/jena-fuseki:4.0.0 to see if you get bettter results. If that doesn't work, I've found that the following initContainer in kubernetes is sufficient to avoid the issue:

      initContainers:
        - name: cleanup-tdb-locks
          image: stain/jena
          command:
            - /bin/bash
            - -c
            - rm -rf /fuseki/system/tdb.lock
          volumeMounts:
          - name: fuseki-data  # match your persistent volume name
            mountPath: /fuseki
rmeissn commented 2 years ago

I'd vote for updating the :latest image to include the mentioned fix.

kinow commented 2 years ago

I can confirm that I hit the same issue with :latest. 4.0.0 worked fine, so it'd be just a matter of updating latest.