influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.63k stars 5.58k forks source link

MySQL plugin: received error partial write #5055

Closed bolek2000 closed 1 year ago

bolek2000 commented 5 years ago

Relevant telegraf.conf:

# # Read metrics from one or many mysql servers
[[inputs.mysql]]
#   ## specify servers via a url matching:
#   ##  [username[:password]@][protocol[(address)]]/[?tls=[true|false|skip-verify|custom]]
#   ##  see https://github.com/go-sql-driver/mysql#dsn-data-source-name
#   ##  e.g.
#   ##    servers = ["user:passwd@tcp(127.0.0.1:3306)/?tls=false"]
#   ##    servers = ["user@tcp(127.0.0.1:3306)/?tls=false"]
#   #
#   ## If no servers are specified, then localhost is used as the host.
#   servers = ["tcp(127.0.0.1:3306)/"]
#    servers = telegraf@unix
servers = ["telegraf@unix(/var/run/mysqld/mysqld.sock)/information_schema"]
#   
#   ## Selects the metric output format.
#   ##
#   ## This option exists to maintain backwards compatibility, if you have
#   ## existing metrics do not set or change this value until you are ready to
#   ## migrate to the new format.
#   ##
#   ## If you do not have existing metrics from this plugin set to the latest
#   ## version.
#   ##
#   ## Telegraf >=1.6: metric_version = 2
#   ##           <1.6: metric_version = 1 (or unset)
   metric_version = 2
#
#   ## the limits for metrics form perf_events_statements
   perf_events_statements_digest_text_limit  = 120
   perf_events_statements_limit              = 250
   perf_events_statements_time_limit         = 86400
#   #
#   ## if the list is empty, then metrics are gathered from all databasee tables
#   table_schema_databases                    = []
#   #
#   ## gather metrics from INFORMATION_SCHEMA.TABLES for databases provided above list
  gather_table_schema                       = false
#   #
#   ## gather thread state counts from INFORMATION_SCHEMA.PROCESSLIST
  gather_process_list                       = true
#   #
#   ## gather user statistics from INFORMATION_SCHEMA.USER_STATISTICS
  gather_user_statistics                    = true
#   #
#   ## gather auto_increment columns and max values from information schema
  gather_info_schema_auto_inc               = true
#   #
#   ## gather metrics from INFORMATION_SCHEMA.INNODB_METRICS
  gather_innodb_metrics                     = true
#   #
#   ## gather metrics from SHOW SLAVE STATUS command output
  gather_slave_status                       = false
#   #
#   ## gather metrics from SHOW BINARY LOGS command output
  gather_binary_logs                        = false
#   #
#   ## gather metrics from PERFORMANCE_SCHEMA.TABLE_IO_WAITS_SUMMARY_BY_TABLE
  gather_table_io_waits                     = true
#   #
#   ## gather metrics from PERFORMANCE_SCHEMA.TABLE_LOCK_WAITS
   gather_table_lock_waits                   = false
#   #
#   ## gather metrics from PERFORMANCE_SCHEMA.TABLE_IO_WAITS_SUMMARY_BY_INDEX_USAGE
   gather_index_io_waits                     = false
#   #
#   ## gather metrics from PERFORMANCE_SCHEMA.EVENT_WAITS
   gather_event_waits                        = false
#   #
#   ## gather metrics from PERFORMANCE_SCHEMA.FILE_SUMMARY_BY_EVENT_NAME
   gather_file_events_stats                  = false
#   #
#   ## gather metrics from PERFORMANCE_SCHEMA.EVENTS_STATEMENTS_SUMMARY_BY_DIGEST
   gather_perf_events_statements             = false
#   #
#   ## Some queries we may want to run less often (such as SHOW GLOBAL VARIABLES)
   interval_slow                   = "1h"
#
#   ## Optional TLS Config (will be used if tls=custom parameter specified in server uri)
#   # tls_ca = "/etc/telegraf/ca.pem"
#   # tls_cert = "/etc/telegraf/cert.pem"
#   # tls_key = "/etc/telegraf/key.pem"
#   ## Use TLS but skip chain & host verification
#   # insecure_skip_verify = false

System info:

Telegraf 1.9, Ubuntu 18.04, MariaDB 10.3

Steps to reproduce:

A couple of weeks ago I activated the mysql plugin the first time on an Ubuntu 14.04 host with MySQL 5.6.33 and telegraf 1.8 with the config above. Then I eneabled the same config on the new host with newer versions.

Expected behavior:

no error messages

Actual behavior:

2018-11-28T20:51:40Z E! [outputs.influxdb]: when writing to [https://idb.example.com:8082]: received error partial write: field type conflict: input field "ssl_ctx_verify_depth" on measurement "mysql" is type float, already exists as type integer dropped=1; discarding points

glinton commented 5 years ago

Although mariadb and mysql are similar, they are not identical. I would not expect flawless upgrades with such large version (and server) changes. You can configure your influx output to write to a different database/retention policy or drop the old data so the new fields will be correct

bolek2000 commented 5 years ago

Thanks for the recommendation, I was not 100% sure what would be a good workaround. I just wanted to let you know about this in case you plan compatibility with MariaDB as well. I will update the ticket with information regarding the newest MariaDB version, maybe its useful for others as well.

danielnelson commented 5 years ago

I think since it is the same plugin it should send data with the same type. Let's research why this is happening and we can make a better determination if a fix is appropriate.

bolek2000 commented 5 years ago

I executed drop measurement mysql on telegraf.autogen and disabled the other mysql plugin instance, so I only get data from the MariaDB 10.3 host. This didn't help here, same error message as in description. Do I need to issue more commands to "fully" get rid of the old measurement ? The outputted metrics look like this:

mysql,dc=eu-dc,host=eu-db-master1,server=/var/run/mysqld/mysqld.sock ssl_ctx_verify_mode=5i,slaves_connected=1i,sort_merge_passes=0i,sort_rows=30957i,ssl_accept_renegotiates=0i,ssl_finished_connects=0i,slaves_running=0i,slow_queries=0i,ssl_callback_cache_hits=0i,ssl_connect_renegotiates=0i,sort_scan=9411i,ssl_accepts=1i,ssl_ctx_verify_depth=18446744073709552000,ssl_default_timeout=0i,slave_skipped_errors=0i,slow_launch_threads=0i,sort_priority_queue_sorts=0i,sort_range=0i,ssl_finished_accepts=1i,ssl_client_connects=0i 1543528501000000000

mysql,dc=eu-dc,host=eu-db-master1,server=/var/run/mysqld/mysqld.sock tc_log_page_size=0i,threadpool_idle_threads=0i,threads_cached=0i,wsrep_connected=0i,wsrep_local_index=18446744073709552000,tc_log_page_waits=0i,threads_created=5i,threads_running=7i,transactions_multi_engine=0i,uptime=263995i,wsrep_local_bf_aborts=0i,threadpool_threads=0i,threads_connected=3i,uptime_since_flush_status=263995i,wsrep_cluster_size=0i,wsrep_cluster_status="Disconnected",wsrep_ready=0i,transactions_gtid_foreign_engine=0i,update_scan=0i,wsrep_cluster_conf_id=18446744073709552000 1543528501000000000
danielnelson commented 5 years ago

I know this plugin makes multiple measurements, is it still complaining on measurement "mysql"?

bolek2000 commented 5 years ago

Yes, it complains only on mysql and field ssl_ctx_verify_depth, which has a very high (unusual/not integer ?) value, as you can see in the file output above. SHOW VARIABLES; doesn't have Ssl_ctx_verify_depth in MariaDB 10.3, maybe thats the problem ? In the docs it says: For compatibility reasons, the TLS status variables in MariaDB still use the Ssl_ prefix, but MariaDB only supports its more secure successors. For more information on SSL/TLS in MariaDB....

https://mariadb.com/kb/en/library/ssltls-status-variables/#ssl_ctx_verify_depth

danielnelson commented 5 years ago

Might be worth turning off Telegraf for a minute and making sure it is deleted on the InfluxDB side by trying to add it back in manually:

$ influx
> use telegraf
> insert mysql,dc=eu-dc,host=eu-db-master1,server=/var/run/mysqld/mysqld.sock ssl_ctx_verify_depth=42

Unless Telegraf is emitting this field multiple times with differing types then it should have conflicts anymore, unless it has not been removed from InfluxDB completely.

BTW the value is the maximum unsigned 64-bit integer.

bolek2000 commented 5 years ago

Maybe I don't understand that completely, but didn't you mention above, that the plugin outputs the field always with the same type (integer). If so, then there sould be an 'i' in the end of the value, isn't it ? Or does the plugin figure the data types dynamically (depending on DB variant or version) ? As I understood InfluxDB cannot write to fields with differing datatypes in the same shard. I found the variable in the DB...SHOW GLOBAL STATUS; Ssl_ctx_verify_depth | 18446744073709551615 I also setup another instance with Ubuntu18.04, MariaDB 10.3 (DB slave, with SSL): Ssl_ctx_verify_depth | 0

I don't understand the differing values in the DB.

Also I found this error in the logs (MariaDB 10.3 slave): received error partial write: field type conflict: input field "have_openssl" on measurement "mysql_variables" is type integer, already exists as type string dropped=5; discarding points

MariaDB [(none)]> show variables like 'have_openssl';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| have_openssl  | YES   |

mysql> show variables like 'have_openssl';
+---------------+----------+
| Variable_name | Value    |
+---------------+----------+
| have_openssl  | DISABLED |

Here on MySQL and MariaDB the variable in the DB is a string. Why does the plugin try to output an integer ?

bolek2000 commented 5 years ago

Also I stopped all telegraf mysql inputs and dropped the measurement again and waited a minute. Then I started telegraf on the MariaDB-Master and get the same error. So I guess there must be something setting up the field as integer and later the plugin tries to send float. Maybe something like this interfering here ? I have IDB 1.6.4 https://github.com/influxdata/influxdb/commit/c14b0e81b7d719acc117c0a83c2ee60a5e2c1641

Starting with not existing measurement mysql:

insert mysql,dc=eu-dc,host=eu-db-master1,server=/var/run/mysqld/mysqld.sock ssl_ctx_verify_depth=42
ERR: {"error":"partial write: field type conflict: input field \"ssl_ctx_verify_depth\" on measurement \"mysql\" is type float, already exists as type integer dropped=1"}

After that the measurement mysql exists but select * from mysql returns nothing.

select "ssl_ctx_verify_depth" from mysql returns nothing, even though ssl_ctx_verify_depth is in the DB as a field. SHOW FIELD KEYS FROM mysqlshows: ssl_ctx_verify_depth integer

SHOW FIELD KEYS FROM mysql_variablesshows: have_ssl string

danielnelson commented 5 years ago

I have experienced this issue where measurements are not dropped immediately myself as well, but I'm not sure exactly how to convince it to drop the measurement. It usually clears up for me after a few minutes, or perhaps after restarting InfluxDB, but here is the related issue: https://github.com/influxdata/influxdb/issues/9694. It sounds like perhaps it is fixed in 1.7.

Until that insert statement works the measurement is still not removed, so focus on that first. You will have to run drop measurement mysql after any attempt to run it. I'm confident that once the old data is gone the plugin will run without error.

bolek2000 commented 5 years ago

Hi Daniel, I upgraded to IDB 1.7.1, stopped all mysql inputs, dropped the mysql measurement, restarted InfluxDB, started the MariaDB master telegraf and still get the described error message. So the data types are still saved somewhere and SHOW FIELD KEYS FROM mysql still shows integer. I will also report to the bug you mentioned.

glinton commented 5 years ago

Perhaps the plugin isn't strict enough and guesses the type. If a non-decimal number came through first, it would set the type as an integer in influx, but if a float came through it would guess that type and fail to insert. :thinking:

bolek2000 commented 5 years ago

Thats what it seems to me as well. I mean, dropping the measurement only helps, when the plugin/telegraf will send always the same field types in the future. But I use the plugin only for a couple of weeks on 2 different DB versions and what I see is that it sends different field types in at least 2 different cases. I don't really understand the code entirely, so how is it designed to be ? Must the plugin/telegraf always output the same field types or does the plugin "detect/discover" the field types and send dynamically depending on detection? Thats what seems to be the main question here. I guess its hard to support all these MySQL/Maria/Percona...DBs and versions, but then it would be helpful to have a little more documentation on these kinds of possible errors, when using different DB versions.

bolek2000 commented 5 years ago

Also I described above that I have the case where the field type differs on 2 differnt hosts with the same DB version (MariaDB 10.3, have_openssl), what should not happen ? So maybe we can track this down, why the plugin thinks this field is integer on one host and string on the other ?

danielnelson commented 5 years ago

In the case where you have 2 hosts with the same DB version but the field type is different what are the versions of Telegraf involved?

bolek2000 commented 5 years ago

1.10.0~a50fb5a2-0 (I tested because of memory leak) and 1.9.0-1 (latest) The error shows up on 1.9.0 host directly after starting telegraf: received error partial write: field type conflict: input field "have_openssl" on measurement "mysql_variables" is type integer, already exists as type string dropped=5; discarding points

My third host (Ubuntu 14.04, MySQL 5.6, telegraf 1.9+1.10, now downgraded to telegraf 1.8.3) showed this before I downgraded: received error partial write: field type conflict: input field "have_openssl" on measurement "mysql_variables" is type string, already exists as type integer dropped=6; discarding points

pprkut commented 5 years ago

I ran into this issue on two of my servers with tokudb_flusher_cleaner_min_buffer_size and tokudb_flusher_cleaner_min_buffer_workdone. On one server this is 94 and 92 respectively, which gets stored as integer. On another server this is set to 18446744073709551615 for both, which somehow wants to get stored as float and thus causes a field type conflict.

Running telegraf 1.8.3

danielnelson commented 5 years ago

This sounds similar to https://github.com/influxdata/telegraf/pull/4430 then, thanks for the info @pprkut.

debu99 commented 4 years ago

received error partial write: field type conflict: input field "wsrep_local_index" on measurement "mysql" is type integer, already exists as type float dropped=2; discarding points

Telegraf 1.12.6

nbari commented 4 years ago

received error partial write: field type conflict: input field "wsrep_local_send_queue_avg" on measurement "mysql" is type integer, already exists as type float dropped=1; discarding points

Telegraf 1.12.5

denisvmedia commented 4 years ago

Wsrep is not fully supported by the Telegraf's MySQL plugin. I created an extended version of it here: https://github.com/go-extras/telegraf-mysql-wsrep

sjwang90 commented 4 years ago

Awesome @denisvmedia! I opened a PR #8219 to add it this plugin to our external plugins list. Let me know if you don't want us to list it.

Wsrep is not fully supported by the Telegraf's MySQL plugin. I created an extended version of it here: https://github.com/go-extras/telegraf-mysql-wsrep

denisvmedia commented 4 years ago

@sjwang90 sure, you can list it. I was just thinking if I could provide a PR for the original plugin. What do you think, is it worth of doing or we keep it as an external plugin?

sjwang90 commented 4 years ago

Okay yeah it may be better to open a PR with the wsrep changes to the original plugin. I think a lot of people could use this feature.

fxedel commented 3 years ago

Hey @bolek2000, has your issue been resolved with #9403?

And @debu99 @nbari @pprkut: Since #9401, numbers that are too big for int are parsed automatically as uint now. This probably applies to these variables:

There could technically be field type conflicts between int and uint now. However, if uint support isn't explicitly enabled, uints are capped to max int in the influx serializer.

Another case is wsrep_local_send_queue_avg, which should be parsed as float according to https://galeracluster.com/library/documentation/galera-status-variables.html#wsrep-local-send-queue-avg

The best solution is to add variables to this mapping, so they always have the same time.

Edit: After reading the TokuDB Docs, I don't think these variable values are correct. This would be a buffer size in the exabytes. This is probably an integer underflow, e.g. uint = 0, uint - 1 = 18446744073709551615, so an error in TokuDB. You could ignore the field values as a quick fix.

nsteinmetz commented 2 years ago

Just got the issue when upgrading from telegraf 1.20.2 to 1.20.4

Nov 23 08:23:53 cod10r1 telegraf[29102]: 2021-11-23T07:23:53Z E! [outputs.influxdb_v2] Failed to write metric (will be dropped: 422 Unprocessable Entity): unprocessable entity: failure writing points to database: partial write: field type conflict: input field "wsrep_cluster_conf_id" on measurement "mysql" is type integer, already exists as type float dropped=1

And I use metrics v2 for a while now.

  [[inputs.mysql]]
    servers = ["telegraf:REDACTED@unix(/var/run/mysqld/mysqld.sock)/"]
    metric_version = 2

    table_schema_databases = []
    gather_table_schema = true
    gather_user_statistics = true
    gather_process_list = true
    gather_slave_status = true
    gather_innodb_metrics = true
rdxmb commented 2 years ago

I've just been running into this issue. I have two docker containers running telegraf:latest . Each of it reads the metrics via input.mysql from "its own" mariadb-database.

When restarting and therefore upgrading one of the containers to

# telegraf --version
Telegraf 1.23.3 (git: HEAD 73121bd1)

I get this error:

2022-08-23T14:44:06Z E! [outputs.influxdb] E! [outputs.influxdb] Failed to write metric (will be dropped: 400 Bad Request): partial write: field type conflict: input field "wsrep_local_index" on measurement "mysql" is type integer, already exists as type float dropped=2

Here is the mysql-part of my telegraf.conf

[[inputs.mysql]]
  servers = ["root:$MYSQL_ROOT_PASSWORD@tcp(127.0.0.1:3306)/?tls=false"]
  gather_table_schema = false
  gather_process_list = false
  gather_user_statistics = false
  gather_info_schema_auto_inc = false
  gather_innodb_metrics = false
  gather_table_io_waits = false
  gather_table_lock_waits = false
  gather_index_io_waits = false
  gather_event_waits = false
  gather_file_events_stats = false
  gather_perf_events_statements = false
  interval_slow = "30m"
  metric_version = 2
  gather_slave_status = true
  gather_binary_logs = true 

The other container is still running with

# telegraf --version
Telegraf 1.19.3 (git: HEAD a799489f)

and "only" logs

2022-08-23T15:10:08Z E! [outputs.influxdb] E! [outputs.influxdb] Failed to write metric (will be dropped: 400 Bad Request): partial write: field type conflict: input field "delay_key_write" on measurement "mysql_variables" is type integer, already exists as type string dropped=11

I have downgraded the "damaged" container to

image: 1.19.3

so both containers have input field "delay_key_write" on measurement "mysql_variables" is type integer, already exists as type string

What is the way to fix this both? I am ready to upgrade the telegraf images to the newest (and fixed) tag. But I think I have to delete some stuff in the influxdb first.

(Actually, this is influxdb:1.8.3 - maybe there was also a minor upgrade on the influxdb, this cannot be evaluated here anymore)

srebhan commented 1 year ago

I think the issue is the automatic type guessing of the mysql plugin. Please use the converter processor to enforce the datatypes you expect.

telegraf-tiger[bot] commented 1 year ago

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Forums or provide additional details in this issue and reqeust that it be re-opened. Thank you!

alexfouche commented 9 months ago

Having Telegraf on both MariaDB and Mysql into the same InfluxDB measurement, i resolved it using the converter processor like that:

# cat /etc/telegraf/telegraf.d/converter.conf 

[[processors.converter]]
[processors.converter.fields]
measurement = ["mysql_variables"]
float = ["innodb_adaptive_flushing_lwm"]
string = ["myisam_recover_options", "old_alter_table"]