moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.72k stars 18.67k forks source link

awslogs: dockerd keeps trying to create already created log group for containerlogs causing the throttling exception for CreateLogStream API call #48522

Closed vbhatt91 closed 1 month ago

vbhatt91 commented 1 month ago

Description

After the docker version upgrade to 25.0.6 noticing the 'Log stream already exists' info multiple times. It is causing the throttling exception for CreateLogStream API call.

Reproduce

Enable the awslogs agent on ECS optimized image and enable the awslogs log driver in ECS task definition with docker version 25.0.6.

Expected behavior

Log stream already created should not trigger a new log stream creation API by the dockerd.

docker version

Client:
 Version:           25.0.5
 API version:       1.44
 Go version:        go1.22.5
 Git commit:        5dc9bcc
 Built:             Mon Jul 29 17:21:34 2024
 OS/Arch:           linux/amd64
 Context:           default

docker info

Client:
 Version:           25.0.5
 API version:       1.44
 Go version:        go1.22.5
 Git commit:        5dc9bcc
 Built:             Mon Jul 29 17:21:34 2024
 OS/Arch:           linux/amd64
 Context:           default

Server:
 Engine:
  Version:          25.0.6
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.22.5
  Git commit:       b08a51f
  Built:            Mon Jul 29 17:22:09 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.20
  GitCommit:        8fc6bcff51318944179630522a095cc9dbf9f353
 runc:
  Version:          1.1.11
  GitCommit:        4bccb38cc9cf198d52bebf2b3a90cd14e7af8c06
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Additional Info

No response

thaJeztah commented 1 month ago

cc @austinvazquez - looks like probably one of your builds, and related to awslogs

vbhatt91 commented 1 month ago

Adding more context that with previous docker version 20.X , there is no such issues reported.

thaJeztah commented 1 month ago

Possible candidates;

My prime suspect would be the update to AWS SDK v2;

vbhatt91 commented 1 month ago

It looks like error is coming as per the code mentioned here.

But I do not see any logic to skip if the log group OR log stream already created.

thaJeztah commented 1 month ago

Good one; it looks like an option was added in https://github.com/moby/moby/commit/d10046f228e4152b12bb6ce7db0008f227b95790;

Not sure if that's exactly what's needed for your case (as in; I think that option disables creating altogether, and not "create if not exist, otherwise skip").

I also noticed that I left a comment that it was possibly never added to the documentation (but haven't checked if it was added in the meantime); https://github.com/moby/moby/pull/42132#issuecomment-882606980

thaJeztah commented 1 month ago

I think that option disables creating altogether, and not "create if not exist, otherwise skip"

Ugh; ignore me; looks like that's already the case, so it doesn't fail (ignores the "already exists"), for the stream but because of that may run into the throttling issue https://github.com/moby/moby/blob/627bbd3fa48b186170931404c9637ade9b854774/daemon/logger/awslogs/cloudwatchlogs.go#L521-L525

Same for group; https://github.com/moby/moby/blob/627bbd3fa48b186170931404c9637ade9b854774/daemon/logger/awslogs/cloudwatchlogs.go#L481-L485

Right, so basically the request in this ticket would be to, instead of "opportunistically" create the stream and group (then gracefully handle the error), to instead check if the stream exists before trying to create it.

I'm not familiar enough with the API to know if that would still result in throttling (querying if the stream exists); also not sure if doing so could result in possible race conditions (TOCTOU); i.e. is it possible for the stream or group it to exist, but removed before the container tries to use it? (or is there some "lease" mechanism to get a claim on it?)

austinvazquez commented 1 month ago

Hi @vbhatt91, thanks for the report. The challenge with checking at runtime for stream already exists is the default TPS for read is less than the create API. (ref: create API doc, read API doc)

So adding logic to check for stream already exists would likely cause more issues than it would resolve. Let me take the action to open a PR for adding the awslogs-create-stream to documentation. This may be a workaround for your use case; the result is the docker daemon will skip stream creation entirely and will expect the log stream to already exist.

austinvazquez commented 1 month ago

@vbhatt91, I opened https://github.com/docker/docs/pull/20928 and cc: you there. PTAL.

thaJeztah commented 1 month ago

closing, because https://github.com/docker/docs/pull/20928 is merged, and should be published on docs.docker.com soon