gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.33k stars 1.74k forks source link

Teleport Web UI Error When Accessing Users and User Roles: "grpc: received message larger than max" #36523

Open programmerq opened 8 months ago

programmerq commented 8 months ago

Expected behavior:

When clicking on Users and User Roles in the Teleport Web UI, the system should retrieve and display the information without any errors, even with a large number of roles in the backend.

Current behavior:

Clicking on Users and User Roles in the Teleport Web UI triggers an error with a red banner showing the message grpc: received message larger than max (13589253 vs. 4194304)

The following appear in the proxy logs at the same time:

Jan 10 14:36:02 example.com teleport[2202215]: 2024-01-10T14:36:02Z DEBU [PGBK]      Fetched change feed events. elapsed:1.983769ms messages:1 pgbk/background.go:262
Jan 10 14:36:03 example.com teleport[2202215]: 2024-01-10T14:36:03Z WARN [NODE:1:CA] Re-init the cache on error error:[
Jan 10 14:36:03 example.com teleport[2202215]: ERROR REPORT:
Jan 10 14:36:03 example.com teleport[2202215]: Original Error: *trace.LimitExceededError grpc: received message larger than max (13589253 vs. 4194304)
Jan 10 14:36:03 example.com teleport[2202215]: Stack Trace:
Jan 10 14:36:03 example.com teleport[2202215]:         github.com/gravitational/teleport/api@v0.0.0/client/client.go:1585 github.com/gravitational/teleport/api/client.(*Client).GetRoles
Jan 10 14:36:03 example.com teleport[2202215]:         github.com/gravitational/teleport/lib/cache/collections.go:1173 github.com/gravitational/teleport/lib/cache.roleExecutor.getAll
Jan 10 14:36:03 example.com teleport[2202215]:         github.com/gravitational/teleport/lib/cache/collections.go:97 github.com/gravitational/teleport/lib/cache.(*genericCollection[...]).fetch
Jan 10 14:36:03 example.com teleport[2202215]:         github.com/gravitational/teleport/lib/cache/cache.go:1551 github.com/gravitational/teleport/lib/cache.(*Cache).fetch.func2
Jan 10 14:36:03 example.com teleport[2202215]:         golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75 golang.org/x/sync/errgroup.(*Group).Go.func1
Jan 10 14:36:03 example.com teleport[2202215]:         runtime/asm_amd64.s:1650 runtime.goexit
Jan 10 14:36:03 example.com teleport[2202215]: User Message: failed to fetch resource: "role"
Jan 10 14:36:03 example.com teleport[2202215]:         grpc: received message larger than max (13589253 vs. 4194304)] cache/cache.go:1071
Jan 10 14:36:03 example.com teleport[2202215]: 2024-01-10T14:36:03Z DEBU [NODE:1:CA] Reloading cache. cache/cache.go:1075
Jan 10 14:36:04 example.com teleport[2202215]: 2024-01-10T14:36:04Z DEBU [PGBK]      Fetched change feed events. elapsed:2.12619ms messages:1 pgbk/background.go:262

Bug details:

The log entries indicated suggest that the grpc message received while fetching roles exceeded the maximum allowed message size, causing the error.

programmerq commented 8 months ago

I did find https://github.com/gravitational/teleport/issues/20420 which has a similar grpc: received message larger than max error, but for application session objects. The reporter on that issue pointed out that disabling the cache made the error go away. That makes me wonder if this is just a cache bug as opposed to a bug with the web UI?

webvictim commented 8 months ago

A Community user also reported this same issue a while back. They told me they had ~13,000 roles, lending weight to the theory that this is related to the byte count of all roles combined.

programmerq commented 8 months ago

On an affected cluster, this same error can happen when running tctl get roles directly on the auth server, so it is not only on the Web UI.

# tctl get roles
2024-01-10T17:56:30Z DEBU [SQLITE] Connected to: file:/var/Lib/teleport/proc/sqlite.db?_busy_timeout=100008 sync=FULL&txlock=immediate, poll stream period: 1s lite/lite.go:254
2024-01-10T17:56:30Z DEBU [SQLITE] — journal_mode=delete, synchronous=2, busy_timeout=10000 Lite/Lite.go:305
2024-01-10T17:56:30Z DEBU           Connecting to: [{0.0.0.0:3025 tcp }]. authclient/authclient.go:63

ERROR REPORT:
Original Error: *trace.LimitExceededError grpc: received message larger than max (13666812 vs. 4194304)
Stack Trace:
    github.com/gravitational/teleport/api8v0.0.0/client/client.go:1585 github.com/gravitational/teleport/api/client.(*Client).GetRoles
    github.com/gravitational/teleport/tool/tctl/common/resource_command.go:1676 github.com/gravitational/teleport/tool/tctl/common. (*ResourceConmand) .getCollection
    github.con/gravitational/teleport/tool/tctl/common/resource_conmand.go:225 github.com/gravitational/teleport/tool/tctl/common. (*ResourceConmand) .Get
    github.com/gravitational/teleport/tool/tctl/common/resource_conmand.go:189 github.com/gravitational/teleport/tool/tctl/common. (*ResourceConmand) .TryRun
    github.com/gravitational/teleport/tool/tctl/common/tctl.go:224 github.com/gravitational/teleport/tool/tctl/common.TryRun
    github.com/gravitational/teleport/tool/tctl/common/tctl.go:98 github.com/gravitational/teleport/tool/tctl/common.Run
    github.com/gravitational/teleport/e/tool/tctl/main.go:20 main.main
    runtime/proc.go:267 runtime.main
    runtime/asm_and64.s:1650 runtime.goexit
User Message: grpc: received message larger than max (13666812 vs. 4194304)
TeleLos commented 8 months ago

I was able to reproduce the issue on my environment by atempthing to create 10000 roles with the help of teleport terraform provider.

Attempting to see my Roles in the Teleport UI shows error "grpc: received message larger than max (4280084 vs. 4194304)"

% tctl get roles
ERROR: grpc: received message larger than max (4280084 vs. 4194304)

zmb3 commented 5 months ago

40165 will resolve this for roles. Users still outstanding.