Open rufreakde opened 2 years ago
Same here.
It was no issue in node v16.15.0 with npm 8.5.5. In the v16.15.1 with npm 8.11.0, the deployment of Pods crashes with CrashLoopBackOff error without any logs, only with the exit code 254.
It seems like an issue with running npm. Are you sure the command is running from the dir your expect it to? If you run npm start
but there is no package lock in the folder it could create such an error.
It seems like an issue with running npm. Are you sure the command is running from the dir your expect it to? If you run
npm start
but there is no package lock in the folder it could create such an error.
Yes we are 100% sure the problem came only because of the change of the version. Other than that nothing changed.
We found some post on AWS forums where it was suggested to use ENTRYPOINT instead of CMD in the dockerfile.
But both options did not work. It seems this issue appeared several times in the past and the solution by AWS issue creator was to rollback image versions…
One example: https://repost.aws/questions/QUtlb2BYIEQjyirCUWspC-CQ/exit-254-lambda-error-with-no-extra-explanations
If you can play a bit with your k8s deployment, I would try overriding the container definition to use a custom command for the container and do a pwd
or a few different command to try and debug
If you can play a bit with your k8s deployment, I would try overriding the container definition to use a custom command for the container and do a
pwd
or a few different command to try and debug
As mentioned before we played around with a lot of things. CMD or ENTRYPOINT. Using only hello worlds like really minimalistic scenarios. But it came down to: Previous version worked afterwards did not work. (and i seems we are not the only ones with this issue) @jasonleakey seems to have the same issue.
@LaurentGoderre I have the same issue, npm is not functional at all in the latest version, it is not related to the working directory since even npm -v returns a newline and the non 0 return code (I bilieve it was 248 for me though). I found this issue searching fro soultuion and downgrading helped. It is not related to command or entrypoint since it is reporducable from command line when you 'terminal into' the pod. Node works fine, everything else seems to be working fine. It is also not a permission issue it seems since I made sure for project directory and /tmp to have same owner as the user I was running commands from. Same exact image works just fine under the same non-root user on my local docker (node with uid and gid changed to 999 and from under default 1000 uid\gid).
I can confirm it's not CMD issue either. I downloaded the image to local and it can be run normally for npm start
.
I suspect the npm v8.11.0 causes the issue. We noticed similar ERESOLVE issues as this thread Node v16.15.1 (npm v8.11.0) breaks some builds and this thread #npm/cli#4998. Although we solved the peerDependencies
issues and compiled the image, the image exits with the empty 254 error.
Just ran into this issue...
243
with limited permissionsEnded up implementing a work-around where I leveraged a multi-stage 🐳 build and ENTRYPOINT
.
################
# Build Stage
################
FROM node:16.15.1 AS build
WORKDIR /app
COPY . .
RUN npm install --production
################
# Final Stage
################
FROM node:16.15.1-alpine3.16 AS final
WORKDIR /app
COPY --chown=nobody --from=build /app /app
# 🍒 FIX CVE-2022-29244
RUN rm -rf /usr/local/bin/npm \
&& rm -rf /root/.npm
USER nobody:nobody
EXPOSE 8080
ENTRYPOINT ["./bin/start.js"]
Obviously this isn't a one-size fits all due to different project structures and requirements, but hopefully it helps someone 🤞🏽
Just ran into this issue...
- failed to run 🐳 locally with error
243
with limited permissions- then with elevated permissions failed on ☸️ AWS due to strict securityContext settings
Ended up implementing a work-around where I leveraged a multi-stage 🐳 build and
ENTRYPOINT
.################ # Build Stage ################ FROM node:16.15.1 AS build WORKDIR /app COPY . . RUN npm install --production ################ # Final Stage ################ FROM node:16.15.1-alpine3.16 AS final WORKDIR /app COPY --chown=nobody --from=build /app /app # 🍒 FIX CVE-2022-29244 RUN rm -rf /usr/local/bin/npm \ && rm -rf /root/.npm USER nobody:nobody EXPOSE 8080 ENTRYPOINT ["./bin/start.js"]
Obviously this isn't a one-size fits all due to different project structures and requirements, but hopefully it helps someone 🤞🏽
We already used multistage build and entrypoint did not help our situation :( But thanks for sharing!
Looks like this problem is still occurring on the latest 16.16.0-alpine
tag.
Is there any update?
any update guys?
Please are there any updates? It seems to be clearly related to an image change.
@PeterDaveHello @nschonni @chorrell @LaurentGoderre @SimenB
We have seen a similar issue as well, and ultimately tracked it down to a native dependency triggering the crash when we updated the build/runtime environment versions.
FROM node:16.15.1 AS build ... FROM node:16.15.1-alpine3.16 AS final
The underlying cause however was that we used to build on node:not-alpine, and run on node:alpine, just like in this snippet from @derekahn. Alpine uses a different C library compared to the non-alpine variant, and you cannot simply "switch them out" -- so if you do actually build a native dependency, you need to make sure to build it on the same environment as you're ultimately running on.
This problem might stay hidden for a long time, as not all native dependencies get used all the time: In our case it was a crypto library, which got used as a part of a smaller application functionality, for instance.
@ankon this seems not to be the case for us. We use the same base alpine image for all of our multistage dockerbuild steps.
# ---- Base Image ----
FROM node:lts-alpine3.15 AS base
ENV DEBIAN_FRONTEND=noninteractive
ENV IMAGE_USER=defaultUser
ENV IMAGE_USER_GROUP=defaultGroup
ENV APP_DIR_IN_USER_DIR=App
RUN \
set -eux \
\
## Update Alpine base \
&& apk update \
&& apk upgrade \
--no-cache \
--progress \
--force-refresh
... base preparation
# ---- NPM Dependencies multistage tests----
FROM base AS build
LABEL env=build
COPY . .
# user
USER root
RUN apk add sqlite
RUN chown -R $IMAGE_USER:$IMAGE_USER_GROUP .
USER $IMAGE_USER
# create and copy production node_modules aside for last layer
RUN set -euxo pipefail \
&& npm audit fix --only=production || true \
&& cp -R node_modules prod_node_modules \
&& rm -rf node_modules/
# ---- Test ----
# no need for audit on dev dependencies since we remove them from final image
... run tests
# ---- Release ----
FROM base AS run
LABEL env=run
COPY . .
# copy production node_modules
COPY --from=build /home/$IMAGE_USER/$APP_DIR_IN_USER_DIR/prod_node_modules ./node_modules
# this will not work with headless images we plan to use in the future.
USER root
RUN chown -R $IMAGE_USER:$IMAGE_USER_GROUP .
USER $IMAGE_USER
EXPOSE 4004
CMD ["npm", "run" , "start"]
Since it is the same base image it should not have this issue right?
At least not in the "trivial" way we could see it in hindsight, the setup in that regard looks sane to me.
Still might be good to check what exactly is crashing, and it is quite likely that there are different underlying causes that manifest in a similar crash unfortunately.
We seem to have run into similar issue the pod crashing, some some time ago tried to go to 16.15.1 and failed and now same thing with 16.16.0. We are stuck at FROM node:16.15.0-bullseye-slim
.
Due to the following security vulnerabilities
We have to update our base image as well from 16.14-alpine3.15. Unfortunately any version above 16.15.1 is having this issue.
The docker image is working fine locally when we run it and in any other state. But when rolled using kubernetes, the application state is: CrashLoopBackOff
Can this issue be prioritised please.
In our case, we managed to fix this issue. We were using a multi stage dockerbuild (install, builder, distribution) and we were using node:16.14-alpine3.15
. In order to cater for security vulnerabilities (CVE-2022-2097, CVE-2022-29458), we had to update to node:16.16-alpine3.15
In our case the fix was to explicitly install and downgrade the npm to 8.5.0
in distribution image in our docker file. We tried any version of npm above 8.5.0
and it didn't work and the issue was reproduced again or some other issues surfaced.
Therefore we had to install and fix the npm version to 8.5.0
and specify exact version.
RUN npm install -g npm@8.5.0 --save-exact
Our docker version previously looked like:
# INSTALL CONTAINER
FROM node:16.16-alpine3.15 as install
...
# BUILDER CONTAINER
FROM node:16.16-alpine3.15 as builder
....
# RUNTIME CONTAINER
FROM node:16.16-alpine3.15 AS distribution
....
We then changed it to explicitly set the npm version on each container to be 8.5.0
and our Dockerfile now looks like this and the issue is fixed.
# INSTALL CONTAINER
FROM node:16.16-alpine3.15 as install
RUN npm install -g npm@8.5.0 --save-exact
...
# BUILDER CONTAINER
FROM node:16.16-alpine3.15 as builder
RUN npm install -g npm@8.5.0 --save-exact
....
# RUNTIME CONTAINER
FROM node:16.16-alpine3.15 AS distribution
RUN npm install -g npm@8.5.0 --save-exact
....
This approach for now resolved the problem. I wish our approach helps others fix their problem, but we understand that even our approach is a work around and hope node image distributes a proper working version of npm in their image.
RUN npm install -g npm@8.5.0 --save-exact This does the trick!
@propattern It seems the issues is related to the NPM version that is shipping with newer images version. Thanks for sharing the workaround! (thumbs up)
Thanks to @propattern for the workaround, this has also worked for our environments.
However, upon further investigation we have managed to fix the problem without downgrading npm
and have done so by changing our Dockerfile to run as the node
user provided by the Docker image. You may find more documentation around this for other use cases here: https://github.com/nodejs/docker-node/blob/main/docs/BestPractices.md#non-root-user
TL;DR: Use the provided node
user
FROM node:16-alpine
# ...
# ...
# At the end, set the user to use when running this image
USER node
CMD ["node", "src/start.js"]
Environment
Expected Behavior
Successfull startup of pod using this image as base image. (16.15.0 cached worked without problems)
Current Behavior
Pod crashes with a
CrashLoopBackOff
and no messages! We only have the following:Possible Solution
Rollback that change?
Steps to Reproduce
Additional Information
We build the docker image on MAC Linux and Windows. Same result old version runs new version fails.
EDIT: Where all began: https://github.com/nodejs/docker-node/commit/194a775693fd40598a1bafd4858e063c24efeb42