vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.42k stars 2.08k forks source link

Exposing rows accessed / rows written as metrics in VTGate #7372

Open jordw opened 3 years ago

jordw commented 3 years ago

Feature Description

We would like to have metrics exposed from VTGate that is an in memory counter of the aggregate rows accessed and rows written across all queries for the process lifetime. It is ok for this value to reset to 0 when the process restarts, no persistent state is required.

Use Case(s)

The primary use case for this feature is to be able to periodically gather metrics on rows written and accessed from a system outside of VTGate for monitoring purposes.

deepthi commented 3 years ago

We have an existing metric for rows ~accessed~ returned (from vtgate's /metrics endpoint)

# HELP vtgate_api_rows_returned Rows returned through the VTgate API
# TYPE vtgate_api_rows_returned counter
# TYPE vtgate_api_rows_returned counter
vtgate_api_rows_returned{db_type="master",keyspace="commerce",operation="Execute"} 17
vtgate_api_rows_returned{db_type="replica",keyspace="commerce",operation="Execute"} 20

Does this look like the right granularity? It can then be aggregated per-keyspace or over a known set of keyspaces. We can add a similar one for writes. Something like vtgate_api_rows_updated.

deepthi commented 3 years ago

With joins or aggregate queries, rows_returned from vtgate won't match the total rows_accessed on vttablets. I'll look into adding rows_accessed in addition to rows_updated.

dweitzman commented 3 years ago

Minor nit: tablet rows returned feels like a more accurate name than rows accessed. Rows accessed could be misinterpreted as innodb rows accessed (which at least percona's mysql distribution has a stat for. It can be helpful for tracking down high frequency queries with bad indexes or that do expensive COUNTs)

jordw commented 3 years ago

I am actually looking for rows accessed instead of rows returned. Rows returned doesn't map to how much work was done by the system to get the result.

jordw commented 3 years ago

I talked with @deepthi offline, so my last comment can be disregarded and we are aligned.

I have another ask, which is to also be able to collect the storage used by the system as well in a similar manner. Is this possible?

deepthi commented 3 years ago

Both vtgate and vttablet use very little storage (just log files and such). Are you interested in stats for that, or is it the mysql instance size on disk that is of interest?

jordw commented 3 years ago

Sorry, I should have been more clear. mysql data size is what I am after.

dweitzman commented 3 years ago

vttablet used read table statistics and upload them as metrics. Sugu refactored schema tracking at some point to make it faster and lighter weight, and one side effect was that it no longer reads that information

Seems like there's an interesting question of philosophy about metrics here: should vttablet poll and upload mysql's statistics, or should vttablet trust that you can or do separately track those direct mysql stats?

I can imagine a world where vttablet lets you opt-in to a process that runs every minute and uploads mysql stats, for people who aren't running some separate process to monitor mysql (like Percona Monitoring and Management or whatever)

deepthi commented 3 years ago

for reference, the changes @dweitzman is referring to were made in #5951

jordw commented 3 years ago

Seems like there's an interesting question of philosophy about metrics here: should vttablet poll and upload mysql's statistics, or should vttablet trust that you can or do separately track those direct mysql stats?

It seems like such a feature could make Vitess easier to operate.

notfelineit commented 3 years ago

Is it possible to associate a query with # of rows accessed?

dweitzman commented 3 years ago

Is it possible to associate a query with # of rows accessed?

I'm not sure, but one place you can get queries and rows read from disk logged together is the slow query logs. It's only helpful for slow queries and it's a log (not dynamic information available to a client), but it can be useful for identifies queries which can many rows and return few rows

notfelineit commented 3 years ago

Slow queries are a good start for our use case, that would work.

deepthi commented 3 years ago

Another way to get this information is from the performance_schema. events_statements_history contains N recent completed events (aka queries I presume) per-thread, and the columns include sql_text and rows_examined.

rbranson commented 3 years ago

Should this actually be bytes, since a row can vary in size pretty dramatically? i.e. pulling 1,000 rows from a table of {integer, integer, integer, integer} isn't anything like pulling 1,000 rows that average out to multiple kilobytes.

deepthi commented 3 years ago

Depends on #7415

deepthi commented 3 years ago

To start with, let us limit this to RowsAffected so that we have a counter that works for DMLs similar to RowsReturned for non-DMLs.

deepthi commented 3 years ago

In #7380, we added a counter for RowsAffected. So we now have the following metrics:

# HELP vtgate_api_rows_affected Rows affected by a write (DML) operation through the VTgate API
# TYPE vtgate_api_rows_affected counter
vtgate_api_rows_affected{db_type="master",keyspace="commerce",operation="Execute"} 14
# HELP vtgate_api_rows_returned Rows returned through the VTgate API
# TYPE vtgate_api_rows_returned counter
vtgate_api_rows_returned{db_type="master",keyspace="commerce",operation="Execute"} 17

7444 is in progress, that will provide storage stats.