influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.45k stars 5.55k forks source link

inputs.sql - allow list of connection strings #12882

Open Trovalo opened 1 year ago

Trovalo commented 1 year ago

Use Case

The input.sql dns parameter accepts only one value, therefore to run queries on multiple servers you need to duplicate the whole input.sql "instance".

I'd like to see it turned into a list (like inputs.sqlserver), of course it will work only as long as the driver is the same (and possibly even authentication... but that depends on the used method, for SQL/basic auth it can be set in each string so I don't really see an issue)

The aim is to have a more compact and manageable config file

Expected behavior

Having a server list

[[inputs.sql]]
  driver = "mssql"
  dsn = [
    "sqlserver://SQLCSRV04/SQL2017" 
    ,"sqlserver://SQLCSRV04/ITEC"
  ]
  [[inputs.sql.query]]
    query="SELECT @@SERVERNAME as srv, 1 as value"

Actual behavior

only one server is allowed per plugin instance

[[inputs.sql]]
  driver = "mssql"
  dsn = "sqlserver://SQLCSRV04/SQL2017" 
  [[inputs.sql.query]]
    query="SELECT @@SERVERNAME as srv, 1 as value"

[[inputs.sql]]
  driver = "mssql"
  dsn = "sqlserver://SQLCSRV04/ITEC"
  [[inputs.sql.query]]
    query="SELECT @@SERVERNAME as srv, 1 as value"

Additional info

No response

Hipska commented 1 year ago

As like with the snmp input plugin, be aware that when having multiple servers into 1 plugin can lead to unexpected results when one of them is down. If one of the servers is taking longer than the interval to return results, it will halt the collection of the other servers as the plugin will get skipped at next interval.

A good alternative to have a better managed config file is to let it compiled by your configuration management tool of preference..

Trovalo commented 1 year ago

I'm aware of that, the same is true for the sqlserver one, but it can be managed by setting the timeout on each connection.

by setting the connection\execution timeout (which is a common parameter for connection strings) that problem is avoided... one connection will fail but the others will fetch any send the data correctly

powersj commented 1 year ago

@Trovalo I was looking into this briefly and came across two questions:

1) In order to differentiate between multiple DSNs we would need to add a source tag which includes both a host and database. We could not use the raw DSN given it can contain secrets and would instead need to parse the DSN and pull out the hostname and database. Looking around I do not see something that you can throw a DSN at it and get this info back easily. Thoughts?

2) For the prepared statements, we currently store these as part of a slice of queries. Since each prepared statement is specific to a certain connection, we would instead need to start storing these as a slice of statements each database, correct?

Trovalo commented 1 year ago
  1. In order to differentiate between multiple DSNs we would need to add a source tag which includes both a host and database.

Building a proper "key" for the data is 100% on the person writing the query, and there is no sure way to get that from the connection string, the parameter keys might differ between RDBMS or be absent (ie: the db name is optional for MSSQL)

  1. For the prepared statements, we currently store these as part of a slice of queries. Since each prepared statement is specific to a certain connection, we would instead need to start storing these as a slice of statements each database, correct?

I honestly don't get the question...

powersj commented 1 year ago

Building a proper "key" for the data is 100% on the person writing the query, and there is no sure way to get that from the connection string, the parameter keys might differ between RDBMS or be absent (ie: the db name is optional for MSSQL)

Your solution then is to document that because you are querying multiple serves you need to include a column/result that includes something to differentiate between servers? I'm worried this would bite a lot of users.

I honestly don't get the question...

Recall that we do not always know as much about databases as you ;) I was trying to ask how we should handle prepared statements with multiple servers and queries. Sounds like yes I do in fact need to store a statement per connection?

Trovalo commented 1 year ago

We can explicitly point out that it is up to the user to make a proper key (with some examples). but it holds true even for the current situation in which you need to create multiple "sessions" of the plugin (one for each dsn)... to me, that's just common sense and it should be pretty straightforward for anyone querying a list of servers. I'll think of some straightforward and nice way of telling people to think about their data globally and mark them accordingly

About how to handle connection, I'd expect to open a connection for each query (as they will run in parallel)