vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.48k stars 2.09k forks source link

vtadmin: Workflows page doesn't list sharded workflows #14150

Closed derekperkins closed 12 months ago

derekperkins commented 1 year ago

Overview of the Issue

Update: I noticed that it's all the sharded workflows missing

vtadmin is returning an incomplete list of workflows. We have workflows running in 4 different keyspaces, which are all returned correctly via vtctlclient (formatting changed for readability)

# NOT included in vtadmin output
$ vtctlclient --server localhost:15999 Workflow domains listall
Following workflow(s) found in keyspace domains:
- domains__domains__copy

# included in vtadmin output
$ vtctlclient --server localhost:15999 Workflow iam listall
Following workflow(s) found in keyspace iam:
- iam__workspaces__copy

# NOT included in vtadmin output
$ vtctlclient --server localhost:15999 Workflow keywords listall
Following workflow(s) found in keyspace keywords:
- keywords__keywords__copy
- keywords__workspaces__copy
- workspaces_rankings__pulls_by_team__copy

# included in vtadmin output
$ vtctlclient --server localhost:15999 Workflow workspaces listall
Following workflow(s) found in keyspace workspaces:
- billing__chargebee_coupon_syncer__msgs
- billing__chargebee_invoice_processor__msgs
- billing__hubspot_company_id_syncer__msgs
- billing__usage
- billing__usage_alerter__msgs
- billing__usage_by_workspace
- iam__users__copy
- workspaces__keywords__copy

vtadmin however is only showing workflows from 2 of the 4 keyspaces. It's not missing any workflows inside a keyspace, just entire keyspaces.

image

There are no server logs, console logs, or failed network calls when visiting the workflows page. On all other pages, the affected keyspaces don't appear to be missing any other data.

Reproduction Steps

Have several workflows running

Binary Version

vtadmin v17.0.2
vtctld v15.0.2
vtgate v15.0.2
vttablet v15.0.2

Operating System and Environment details

GKE v1.27

Log Fragments

No response

ajm188 commented 1 year ago

There are no server logs, console logs, or failed network calls when visiting the workflows page.

if there's anything going wrong here, it's very likely from vtctld=>vtadmin; i'd check the vtctld logs, not vtadmin

ajm188 commented 1 year ago

i'll also point out that vtadmin generally only works with vtctld +/-1, not 2 as you have

derekperkins commented 1 year ago

Sorry, I forgot to update the issue, everything is on v17 and it's the same behavior.

Good call on the vtctld logs - that's returning an error, but I'm not sure what it's trying to query that's over 10k rows

W1002 21:37:43.909828       1 query_plan.go:150] Result on uscentral1-1040968600: rpc error: code = Unknown desc = TabletManager.VReplicationExec on uscentral1-1040968600 error: Row count exceeded 10000: Row count exceeded 10000
ajm188 commented 1 year ago

possibly there are a lot of logs for that workflow