influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.7k stars 5.59k forks source link

Avro schema registry concurrency issue with many topics #15920

Closed athornton closed 1 month ago

athornton commented 1 month ago

Relevant telegraf.conf

Using avro parser with schema registry and a large number of topics.  In our case they're regex-matched so it's not even really the config, it's that the regex matches a large number of topics.

Logs from Telegraf

... "lsst.sal.MTOODS.command_exitControl" "lsst.sal.MTOODS.command_setAuthList" "lsst.sal.MTOODS.command_setLogLevel" "lsst.sal.MTOODS.command_standby" "lsst.sal.MTOODS.command_start" "lsst.sal.MTOODS.logevent_authList" "lsst.sal.MTOODS.logevent_errorCode" "lsst.sal.MTOODS.logevent_heartbeat" "lsst.sal.MTOODS.logevent_imageInOODS" "lsst.sal.MTOODS.logevent_logLevel" "lsst.sal.MTOODS.logevent_logMessage" "lsst.sal.MTOODS.logevent_simulationMode" "lsst.sal.MTOODS.logevent_softwareVersions" "lsst.sal.MTOODS.logevent_summaryState"]
fatal error: concurrent map read and map write

System info

Telegraf 1.32ish, but bug's been there all along since the schema registry was introduced

Docker

No response

Steps to reproduce

Configure a whole bunch (hundreds? thousands?) of different topics to be matched, and start Telegraf.

Expected behavior

To not have it crash on concurrent map access.

Actual behavior

Crashes because the same map is being read and written at the same time.

Additional info

I've got a fix for it and will come back here and update once I've got the PR in.