Closed rsvoboda closed 3 years ago
but the final directory is always file-cache-9cf306ad-0c1b-4d8f-8468-f6d148da5859
Looking at the code in vert.x[1], it always uses a call to UUID.randomUUID()
for the directory name. Do you mean the same UUID is being generated across multiple runs of the build process? What Windows OS is this?
I was wondering the same. While the base dir is fixed the actually name is generated at runtime.
I'm talking about native image on Windows 10.
In JVM mode different directory gets used with every run, with native image binary the directory is always the same (for that one binary).
While the base dir is fixed the actually name is generated at runtime.
True for JVM mode, not in native mode.
Native mode will make our heads hairless:)
Just to be sure, it only happens on windows?
Maybe the boot failing is happening only on Windows but I would bet the hardcoded temp directory is a global issue if the directory name is generated at static init.
The directory name is computed at runtime and is not stored in a static field (the File Resolver is created in the VertxImpl constructor). Also, the directory should be clearer on shutdown (at least it's what the code is trying to do...).
I've seen it on Windows only, maybe it's because Windows are more strict when it comes to filesystem and locks on files.
the directory should be clearer on shutdown
Sure, but that didn't happen with mvn clean install -Dnative
, vertx-cache directory was left there and target\getting-started-1.0-SNAPSHOT-runner.exe
(after maven command) failed because the directory existed.
So I tried on macOS and the tmp folder name is unique on every run of native app. On macOS I'm using 19.3.1, on Windows machine I used GraalVM 20 as suggested in PR for Windows native by Tomaz.
3x execution of native binary with Installed features: [cdi, resteasy]
/var/folders/nn/w1bsdgrx69n9gch4w4vkrkv40000gn/T/vertx-cache/file-cache-2554416b-2b60-4a9e-bf62-4de1b02029e4
/var/folders/nn/w1bsdgrx69n9gch4w4vkrkv40000gn/T/vertx-cache/file-cache-def0845c-f09f-4888-8869-b7555b488519
/var/folders/nn/w1bsdgrx69n9gch4w4vkrkv40000gn/T/vertx-cache/file-cache-8eae8c90-220d-417b-81a3-b82c8f0de9b4
I will have a look tomorrow. I need to setup a Windows 10 VM. We need to test it with GraalVM 19.3 and 20.x.
Clement On 9 Mar 2020 at 13:06 +0100, Rostislav Svoboda notifications@github.com, wrote:
So I tried on macOS and the tmp folder name is unique on every run of native app. On macOS I'm using 19.3.1, on Windows machine I used GraalVM 20 as suggested in PR for Windows native by Tomaz. 3x execution of native binary with Installed features: [cdi, resteasy] /var/folders/nn/w1bsdgrx69n9gch4w4vkrkv40000gn/T/vertx-cache/file-cache-2554416b-2b60-4a9e-bf62-4de1b02029e4 /var/folders/nn/w1bsdgrx69n9gch4w4vkrkv40000gn/T/vertx-cache/file-cache-def0845c-f09f-4888-8869-b7555b488519 /var/folders/nn/w1bsdgrx69n9gch4w4vkrkv40000gn/T/vertx-cache/file-cache-8eae8c90-220d-417b-81a3-b82c8f0de9b4 — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
Unfortunately, I'm not able to build a native-image on Windows...
Error C2059: syntax error: '~'
@cescoffier do you have the right tools installed on your Windows box? See https://github.com/quarkusio/quarkus/pull/7663#issue-385067018
@rsvoboda I forgot to ask, does this only happen with 20.x of GraalVM or even with 19.3.x? Either way, I still can't see how this exception can happen unless either:
UUID.randomString().toString()
in that code someone gets inlined to a precomputed value, which seems impossible (even for native image semantics)UUID.randomString()
on that setup seems to be generating duplicates (which apparently is possible). But for it to generate the duplicates this consistently seems really odd and points to a potentially bigger problem.
Is this some Windows VM by the way or is Windows really the host OS?It's my home laptop, Windows is the host OS, no VM. Using GraalVM 20, didn't try 19.3 as Windows support for native-image improved with GraalVM 20.
I tried to use 19.3.1 on Windows machine, but it can't be used with Quarkus, getting error
[INFO] [io.quarkus.deployment.pkg.steps.NativeImageBuildStep] C:\Program Files\GraalVM\graalvm-ce-java11-19.3.1\bin\native-image.cmd -J-Dsun.nio.ch.maxUpdateArraySize=100 -J-Djava.util.logging.manager=org.jboss.logmanager.LogManager -J-Dvertx.logger-delegate-factory-class-name=io.quarkus.vertx.core.runtime.VertxLogDelegateFactory -J-Dvertx.disableDnsResolver=true -J-Dio.netty.leakDetection.level=DISABLED -J-Dio.netty.allocator.maxOrder=1 -J-Duser.language=cs -J-Dfile.encoding=Cp1250 --initialize-at-build-time= -H:InitialCollectionPolicy=com.oracle.svm.core.genscavenge.CollectionPolicy$BySpaceAndTime -H:+JNI -jar getting-started-1.0-SNAPSHOT-runner.jar -H:FallbackThreshold=0 -H:+ReportExceptionStackTraces -H:-AddAllCharsets -H:-IncludeAllTimeZones -H:EnableURLProtocols=http -H:-UseServiceLoaderFeature -H:+StackTrace getting-started-1.0-SNAPSHOT-runner
Warning: the '=' character in program arguments is not fully supported.
Make sure that command line arguments using it are wrapped in double quotes.
Example:
"--vm.Dfoo=bar"
Error: Unknown arguments: org.jboss.logmanager.LogManager, io.quarkus.vertx.core.runtime.VertxLogDelegateFactory, true, DISABLED, 1, cs, Cp1250, com.oracle.svm.core.genscavenge.CollectionPolicy$BySpaceAndTime, 0, http, getting-started-1.0-SNAPSHOT-runner
Windows native-image based binary generates always the same UUID.
I filed an issue in GraalVM tracker - https://github.com/oracle/graal/issues/2265
@rsvoboda @cescoffier I'm experiencing this issue with jvm and native mode in openshift. not sure what I'm doing wrong
not sure if this is the issue exactly I get this: 2021-02-16 22:27:32,212 ERROR [io.qua.run.Application] (main) Failed to start application (with profile prod): java.lang.IllegalStateException: Failed to create cache dir: /tmp/vertx-cache/file-cache-840facac-d89f-4b7b-9980-0f48350ea364
@karesti in your case I believe it's because you don't have the permission to write in /tmp
, can you check that you have the permissions?
Any way to get a reproducer if that's not the case?
@cescoffier oh ok. I will check this with the team. How can I give those permissions in openshift ?
@cescoffier I will check with the team so I get it solved if it's a matter of permissions
Normally on OpenShift /tmp
is writable. Did it change?
So, the problem is that the docker file uses RUN chown 1001 /work \ and in openshift the user changes so the /tmp file has no rights. I fixed the issue by using this docker file:
FROM registry.access.redhat.com/ubi8/ubi-minimal:8.3
WORKDIR /work/
RUN chown 1001 /work \
&& chmod "g+rwX" /work \
&& mkdir /tmp/vertx-cache \
&& chown 1001:root /work \
&& chmod 775 /tmp/vertx-cache
COPY --chown=1001:root target/*-runner /work/application
EXPOSE 8080
USER 1001
CMD ["./application", "-Dquarkus.http.host=0.0.0.0"]
@cescoffier I forgot to ping you in my comment above
The default Dockerfile.native
is:
FROM registry.access.redhat.com/ubi8/ubi-minimal:8.3
WORKDIR /work/
RUN chown 1001 /work \
&& chmod "g+rwX" /work \
&& chown 1001:root /work
COPY --chown=1001:root target/*-runner /work/application
EXPOSE 8080
USER 1001
CMD ["./application", "-Dquarkus.http.host=0.0.0.0"]
So it needs to be adjusted?
In other words: is /tmp/vertx-cache
something specific just for your use-case or it's common for users to define it on their own?
@rsvoboda I did not use anything specific or custom. the quarkus app dit not start simply. I used 4 extensions: websockets, infinispan, rest and scheduler
I just tried (with a different set of extensions but would still use vert.x) on an OpenShift on Azure and no problem.
D'oh didn't see the ubi-minimal version, I may have an older version.
Which version of OpenShift was used? Especially interested in the version used by @karesti.
Just checked with the very last ubi:8.3 - no problem. So might be an OpenShift permission issue.
@rsvoboda I'm using a dedicated version 4.6.8
@rsvoboda is that a setting you can reproduced? I don't have "dedicated", I only have cloud instances.
We were not able to reproduce. As if there were some specific permissions specified in the used OpenShift cluster. @pjgg was trying to reproduce this trouble.
@karesti I am trying to reproduce your issue without success. Could you try to deploy on your cluster the following image:
quay.io/pjgg/webclient:0.0.1
For example, a valid YAML could be something like this:
apiVersion: v1
kind: Pod
metadata:
name: security-demo
spec:
volumes:
- name: demo
emptyDir: {}
containers:
- name: demo
image: quay.io/pjgg/webclient:0.0.1
livenessProbe:
httpGet:
path: /q/health/live
port: 8080
initialDelaySeconds: 59
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /q/health/ready
port: 8080
initialDelaySeconds: 30
timeoutSeconds: 5
@pjgg this pod deploy worked! I used the scheduler and websockets extension (if this might be an issue)
I will test the deployment in another way
@pjgg So I tested using deployment config and 2 pods and works as well. Might the scheduler or websockets extensions be related?
I am not sure, yet. Could you show me your dependencies? (maven/gradle) just to set up an scenario with the same version as you?.
When you said:
I tested using deployment config and 2 pods and works as well.
Do you mean that you use your application and the default Docker file, and works?
I have tested with a scheduler (quartz), and looks ok!.
Could be tested on your environment, with this docker image: quay.io/pjgg/webclient:0.0.1-a
Another aspect that might be relevant: I'm running two instances of the same native runner on a machine on separate user accounts. The first one can be launched normally. The second one throws the same exception as above. The cause is that the first user has created and is thus owning /tmp/vertx-cache
. This makes the second process fail to write into the same path.
Both users have defined the paths to separate temp dirs in the environment, e.g. for one of them:
# env | grep -i tmp
TEMPDIR=/tmp/user/107329
TMPDIR=/tmp/user/107329
TEMP=/tmp/user/107329
TMP=/tmp/user/107329
I guess that the /tmp
base path is either hard coded or taken from my development machine (or even from within the GraalVM container). So reading this value at runtime would be a benefit.
For me the workaround using -Djava.io.tmpdir=$TMPDIR
has helped.
The vertx-cache
directory should be world-writable, and it might not be the case. I need to investigate this.
So, the vertx-cache
is configured by quarkus but created by vert.x.
A workaround could be to generate a unique cache at each startup.
@rsvoboda any concern with such an approach?
I spoke with @pjgg about this. In general we are ok with such change.
The only concern we see is around @karesti use-case whe Katia worked with known path /tmp/vertx-cache
. On the other side we were not able to reproduce her problem. If the unique cache is based on timestamp we could create a vertx-cache/4163465465465
temporal folder, instead of vertx-cache-4163465465465
so karesti workaround could still work.
About unique cache generate at each startup - also in NATIVE mode?
Yes, it's going to be generated on every application start, so even in native mode, multiple runs will generate multiple directories.
FYI @rsvoboda I am able to reproduce this error on Quarkus 2.7.X in NO NATIVE (didn't tried on native), not reproducible in Quarkus 2.6.3.Final.
Oh BTW, not testing in Windows. It happens in a Kubernetes cluster, using a new PVC mounted in /tmp directory. None of this fails in Quarkus 2.6.3.Final
Please file new issue with steps to reproduce, this sounds like a new variant of the problem.
Please file new issue with steps to reproduce, this sounds like a new variant of the problem.
Ok so it seems I had to create an init-containers in order to set the permissions properly. Thanks very much!
@juangon Can you provide a snippet of changes you did so that other people can learn from your fix? Assuming you have custom yaml descriptor and not relying on kubernetes yaml generated by Quarkus extension.
Yes sure, I don't have a custom YAML descriptor, but will write here anyway.
Here is the PVC for a microservice:
quarkus.kubernetes.pvc-volumes.backend-persistent-storage.claim-name=pv-claim
quarkus.kubernetes.mounts.backend-persistent-storage.path=/tmp
And here the init-containers
quarkus.kubernetes.init-containers.set-tmp-permissions.image=registry.access.redhat.com/ubi8/ubi:latest
quarkus.kubernetes.init-containers.set-tmp-permissions.mounts.backend-persistent-storage.path=/tmp
quarkus.kubernetes.init-containers.set-tmp-permissions.command=chown,-R,185,/tmp
User is 185 as existing in latest Dockerfile
Thanks!
Quarkus boot fails when vertx-cache temporary directory exists. Running on Windows 10, using GraalVM 20 JDK 11 based, native image
First I thought the problem is in the file path which mixes Windows and Linux path separators, but that is not the problem. The problem is that
file-cache-9cf306ad-0c1b-4d8f-8468-f6d148da5859
directory exists and Quarkus native binary boot fails due to that fact. This directory was generated duringmvn clean install -Dnative
execution but wasn't cleaned at the end.Workaround is to remove that temporary directory and after this step Quarkus .exe app boots without problem, temporary directory gets cleaned during the shutdown. So following executions run without problem. The same file-cache directory gets generated on each run.
Few points:
mvn clean install -Dnative
executionThe third point is the biggest issue from my perspective as it blocks portability (C:\Users\ROSTIS~1\AppData\Local\Temp\ would be expected on every system the app gets started?)I tried on another windows machine and the path is not fully hard-coded, user's tmp directory is reflected, but the final directory is alwaysfile-cache-9cf306ad-0c1b-4d8f-8468-f6d148da5859
, Always the same path can be risky from a security perspective, knowledgeable users know where to look beforehand.Steps to reproduce on sample bootstrap app: