google / gvisor

Application Kernel for Containers
https://gvisor.dev
Apache License 2.0
15.75k stars 1.29k forks source link

Catch log: "Container Sandbox: Unsupported syscall setsockopt" from Google Cloud Run #1739

Closed attakei closed 2 years ago

attakei commented 4 years ago

I don't know if this is the correct site to publish this kind of issues as it is related to gVisor but on top of GKE.

Description

I try to use nginx-unit image ( https://hub.docker.com/r/nginx/unit ) on Google Cloud Run. But, when running container, failed to call kill command.

In container process

This image run entrypoint.sh and has has four steps in shell.

  1. Run background process.
  2. Inject configuration into process.
  3. Stop backgroud process by kill comand.
  4. Run foreground process.

Currently, when running application container based vendor official image, kill command is not accepted, service is not availaved.

Cloud Run has output this log in running container:

Container Sandbox: Unsupported syscall setsockopt(0xb,0x6,0x9,0x3ee1608589cc,0x4,0x29910fc86500). It is very likely that you can safely ignore this message and that this is not the cause of any error you might be troubleshooting. Please, refer to https://gvisor.dev/c/linux/amd64/setsockopt for more information.

Reproduce steps

Build image from repository and run service from image. https://gitlab.com/attakei-sandbox/gvisor-issue-setsockopt

I saw logs from service in Iowa region (GCP). Please see exported csv-log from GCP.

Information from other environments

Local docker engine

Run normally.

$ docker version                                                                                                      Client:
 Version:           19.03.5-ce
 API version:       1.40
 Go version:        go1.13.4
 Git commit:        633a0ea838
 Built:             Fri Nov 15 03:19:09 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.5-ce
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.4
  Git commit:       633a0ea838
  Built:            Fri Nov 15 03:17:51 2019
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          v1.3.2.m
  GitCommit:        d50db0a42053864a270f648048f9a8b4f24eced3.m
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Local docker engine with runsc

Run normally.

$ runsc --version                                                                                                     runsc version release-20200127.0-51-g02997af5abd6
spec: 1.0.1-dev
coronaction commented 4 years ago

Also getting similar message on Google Cloud Run for a container running a Java program wrapped in Quarkus framework. Happy to provide additional info if I know which one is of interest for this case. Just let me know

Some messages:

  1. Container Sandbox: Unsupported syscall setsockopt(0xae,0x0,0xb,0x3e6ff77fc1d4,0x4,0x0)
  2. Container Sandbox: Unsupported syscall setsockopt(0xae,0x29,0x31,0x3e6ff77fd7b4,0x4,0x4)
  3. Container Sandbox: Unsupported syscall setsockopt(0xae,0x29,0x12,0x3e6ff77fd7bc,0x4,0x4)
pebo commented 4 years ago

The lastest official Node version v12.17.0 triggers setsockopt warnings on Cloud Run, e.g.

Container Sandbox: Unsupported syscall setsockopt(0x13,0x6,0x6,0x3ea340cbc70c,0x4,0x1cc3929404b1).
Container Sandbox: Unsupported syscall setsockopt(0x1b,0x6,0x6,0x3ea340cbc70c,0x4,0x1cc3929404b1)

Would it be possible to suppress these warning as cloud logging gets spammed?

didier-durand commented 4 years ago

To whom are you asking this question: to gVisor team or to me?

pebo commented 4 years ago

@didier-durand You probably got notified as you've subscribe to this issue.

I guess it's a feature request to the gVisor team - It would be nice to be able to suppress warnings (e.g. for socket options that is / cannot be implemented i gVisor).

didier-durand commented 4 years ago

I agree with you but then we have to be able to select this option from the Google Cloud UI if we need to remove this logging.

fevernova90 commented 4 years ago

Same for me, getting this syslog spamming my whole Logging Stack. Running Cloud Run.

Sytten commented 4 years ago

Also having this issue, but different code:

Container Sandbox: Unsupported syscall setsockopt(0x13,0x6,0x6,0x3ef9e6878e2c,0x4,0xab9dd404b1)

I am using nodejs 12.18.

shvgn commented 4 years ago

I use k6.io to run load tests on my service in Cloud Run. What I see it that roughly 2% of tcp connections fail. In Cloud Run logs I see lots of these (along with membarrier):

Container Sandbox: Unsupported syscall setsockopt(0x17,0x6,0x6,0x3e6f301f9734,0x4,0x0). It is very likely that you can safely ignore this ...

k6 warnings during the test run:

...
WARN[0629] Request Failed      error="Get \"https://<...>.run.app/<...>\": dial tcp <...>:443: i/o timeout"
WARN[0629] Request Failed      error="Get \"https://<...>.run.app/<...>\": unexpected EOF"
WARN[0629] Request Failed      error="Post \"https://<...>.run.app/<...>\": dial tcp <...>:443: i/o timeout"
WARN[0630] Request Failed      error="Get \"https://<...>.run.app/<...>\": unexpected EOF"
WARN[0630] Request Failed      error="Post \"https://<...>.run.app/<...>\": write tcp <...>:62719-><...>:443: write: broken pipe"
WARN[0631] Request Failed      error="Post \"https://<...>.run.app/<...>\": dial tcp <...>:443: i/o timeout"
WARN[0631] Request Failed      error="Post \"https://<...>.run.app/<...>\": dial tcp <...>:443: i/o timeout"
WARN[0632] Request Failed      error="Post \"https://<...>.run.app/<...>\": dial tcp <...>:443: i/o timeout"
WARN[0632] Request Failed      error="Post \"https://<...>.run.app/<...>\": unexpected EOF"
WARN[0632] Request Failed      error="Post \"https://<...>.run.app/<...>\": unexpected EOF"
WARN[0633] Request Failed      error="Post \"https://<...>.run.app/<...>\": unexpected EOF"
WARN[0633] Request Failed      error="Get \"https://<...>.run.app/<...>\": dial tcp <...>:443: i/o timeout"
WARN[0633] Request Failed      error="Get \"https://<...>.run.app/<...>\": dial tcp <...>:443: i/o timeout"
WARN[0633] Request Failed      error="Post \"https://<...>.run.app/<...>\": dial tcp <...>:443: i/o timeout"
WARN[0634] Request Failed      error="Post \"https://<...>.run.app/<...>\": dial tcp <...>:443: i/o timeout"
WARN[0634] Request Failed      error="Post \"https://<...>.run.app/<...>\": EOF"

The service base image is node:14-alpine.

iangudger commented 4 years ago

This seems to be SOL_TCP, TCP_KEEPCNT which was fixed in 4b9652d.

RtypeStudios commented 4 years ago

I have thousands of these appearing. I'm using .net core on this image:

FROM mcr.microsoft.com/dotnet/core/aspnet:3.1.2-alpine3.11

Would be great to filter these out as it makes log reading a bit difficult.

image

AndreiIgna commented 4 years ago

@iangudger is there something we can do after that fix?

Having the same problem, it's quite hard to see something useful in logs when this line is duplicated so many times

Screenshot 2020-08-11 at 20 13 59

gVisor is referenced in the log https://gvisor.dev/c/linux/amd64/setsockopt

ytnobody commented 4 years ago

I saw too. Today Container Sandbox: Unsupported syscall membarrier log on Google Cloud Run.

スクリーンショット 2020-08-14 12 08 20

Frequency of this phenomenon is about 1 to 5 times on a day.

I deployed the container image that is based on golang:1.12-alpine

iangudger commented 4 years ago

@RtypeStudios Can you post the full log line? You cut off the important part.

@AndreiIgna Your logs are about a different socket option (SOL_IP, IP_MTU_DISCOVER). That is tracked in #1643.

@ytnobody Your logs are about a different syscall entirely (membarrier). Please see the compatibility note in the log line that you posted. membarrier is being tracked in #267.

@nlacasse Has 4b9652d rolled out to Cloud Run yet?

pebo commented 4 years ago

We get "warnings" logged for Cloud Run containers running a JVM app with ktor / netty and google libraries for accessing BQ and GCS.

Is there an issue tracking: Container Sandbox: Unsupported syscall setsockopt(0x13,0x0,0xb,0x3ed13c7f9974,0x4,0x2c1) ?

vojkny commented 3 years ago

Similar thing here: spamming my logs, hard to see whatt is relevant.

marcelsauer4711 commented 3 years ago

getting the same message in the logs. Java Spring Application....

{ "textPayload": "Container Sandbox: Unsupported syscall setsockopt(0xc9,0x29,0x12,0x3dfefc9fd864,0x4,0x3). It is very likely that you can safely ignore this message and that this is not the cause of any error you might be troubleshooting. Please, refer to https://gvisor.dev/c/linux/amd64/setsockopt for more information.", "insertId": "5fbbb75400091587f1e993e7", "resource": { "type": "cloud_run_revision", "labels": { "revision_name": "helloworld-24fjz", "project_id": "xxx", "configuration_name": "helloworld", "location": "europe-west1", "service_name": "helloworld" } }, "timestamp": "2020-11-23T13:21:24.595316477Z", "severity": "DEBUG", "labels": { "instanceId": "xxx" }, "logName": "xxx", "receiveTimestamp": "2020-11-23T13:21:24.783347593Z" }

sshcherbakov commented 3 years ago

Here's an error from my Java Spring Boot based gRPC server on Cloud Run:

Container Sandbox: Unsupported syscall setsockopt(0x6,0x29,0x31,0x3efbd9dfc3a4,0x4,0x0). It is very likely that you can safely ignore this message and that this is not the cause of any error you might be troubleshooting. Please, refer to https://gvisor.dev/c/linux/amd64/setsockopt for more information

Not sure, but that sounds like: SO_DEBUG, SO_DONTROUTE, SO_BROADCAST socket options, right? Which ones in particular are not supported by gVisor, SO_DEBUG?

RtypeStudios commented 3 years ago

@iangudger So sorry, just saw your request for more information. I'm still getting thousands of them.

2021-02-03 18:09:04.374 AWSTContainer Sandbox: Unsupported syscall setsockopt(0x12e,0x1,0xd,0x3e23363fefb8,0x8,0x3e234cef5490). 
It is very likely that you can safely ignore this message and 
that this is not the cause of any error you might be troubleshooting. Please, refer to 
https://gvisor.dev/c/linux/amd64/setsockopt for more information.

2021-02-03 18:09:04.375 AWSTContainer Sandbox: Unsupported syscall setsockopt(0x12e,0x1,0xd,0x3e23363fefb8,0x8,0x3e234cf1a688). 
It is very likely that you can safely ignore this message and 
that this is not the cause of any error you might be troubleshooting. Please, refer to 
https://gvisor.dev/c/linux/amd64/setsockopt for more information.

2021-02-03 18:09:05.489 AWSTContainer Sandbox: Unsupported syscall setsockopt(0x12f,0x1,0xd,0x3e2345fffa38,0x8,0x0). 
It is very likely that you can safely ignore this message and 
that this is not the cause of any error you might be troubleshooting. Please, refer to 
https://gvisor.dev/c/linux/amd64/setsockopt for more information.
sshcherbakov commented 3 years ago

0xd and 0x31 have only first bit (SO_DEBUG) in common. Am I completely off here.

RtypeStudios commented 3 years ago

seems the say in the errors I have posted, but different to the errors others have posted. I have no idea what they mean :)

sshcherbakov commented 3 years ago

Sorry, the error I listed didn't influence the functionality of my gRPC server (the reason of the malfunction was something else), please ignore.

louis030195 commented 2 years ago

I don't know if I should create a new issue:

Container Sandbox: Unsupported syscall sched_getattr(0x37d,0x3e045912c300,0x38,0x0,0x1,0x3e045912c300). It is very likely that you can safely ignore this message and that this is not the cause of any error you might be troubleshooting. Please, refer to https://gvisor.dev/docs/user_guide/compatibility/linux/amd64/sched_getattr for more information.

The documentation page does not exist. I suspect to be caused by the use of playwright.dev (python API) or maybe beautifulsoup

FROM mcr.microsoft.com/playwright:focal

dependencies

google-cloud
google-cloud-firestore
google-cloud-storage
Flask[async]==2.0.2
gunicorn==20.1.0
beautifulsoup4
playwright
requests
fire
tqdm
pandas
openai
scraperapi-sdk
parsel
aiologger
johnf1004 commented 2 years ago

Did anyone ever figure out how to suppress these warning messages?

petehannam commented 2 years ago

@johnf1004 use the gen2 execution environment: https://cloud.google.com/run/docs/about-execution-environments

johnf1004 commented 2 years ago

Nice, thank you @petehannam !

Works nicely when deploying through the console - any idea if it's possible to specify the gen2 environment with gcloud? Looking through the flags for gcloud run deploy and cant see anything

petehannam commented 2 years ago

@johnf1004 The documentation has details on how to run it via the command line:

gcloud beta run deploy --image IMAGE_URL --execution-environment gen2

vojkny commented 2 years ago

Note that I am avoiding gen2 because of slower cold starts.

kevinGC commented 2 years ago

This issue sort of straddles the line between gVisor and its downstream consumers. I believe that Cloud Run and others allow for logs to be filtered via terms such as NOT "Unsupported syscall setsockopt".

From gVisor's perspective, the unsupported syscall logs are important. In the rare cases where unsupported syscalls do affect program behavior, the logs are an important debugging tool. We don't want to remove them, as when things do break they will be extra difficult to debug both for users and for us.

Please do file specific issues if you're getting major logspam or application behavior is affected. For now, this issue seems to have become a catchall and I think we should have users file specific bugs for specific messages.

kevinGC commented 2 years ago

For anyone coming across this in the future: if you're seeing the Unsupported syscall message and it either is (1) affecting application behavior or (2) logspamming like crazy, please open an issue for your particular message. I'm closing this one because it's too many issues clumped together and it's not clear which need addressing.