influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.53k stars 5.56k forks source link

sqlserver input: change statement_text from a tag to a field #6976

Open sawo1337 opened 4 years ago

sawo1337 commented 4 years ago

Relevant telegraf.conf:

default sqlserver setup, query_version=2

System info:

Telegraf 1.13.2 for Ubuntu SQL Server 2017 server

Steps to reproduce:

  1. Configure Telegraf to connect to mssql server with default settings
  2. Examine the data stored in measurement sqlserver_requests, tag statement_text

Expected behavior:

data to be stored without escape characters, for example: select @par = parfrom [server].[db].dbo.table where id = @p_par_id and ([dbid] = @p_db_id or [db_id] = 0

Actual behavior:

data is stored with \n, \t, \r characters: select @par = par\r\n\t\tfrom [server].[db].dbo.table\r\n\t\twhere id = @p_par_id and \r\n\t\t\t([dbid] = @p_db_id or [db_id] = 0

danielnelson commented 4 years ago

Can you attach the output of telegraf --input-filter sqlserver --test? Just one line from the output that shows how we are writing the metrics?

sawo1337 commented 4 years ago

Sure:

sqlserver_requests,command=AWAITING\ COMMAND,database_name=tempdb,host=HOST,host_name=HOST,program_name=telegraf,session_db_name=tempdb,sql_instance=SERVER,statement_text=SET\ DEADLOCK_PRIORITY\ -10;\nDECLARE\ @sys_info\ TABLE\ (\n\tcpu_count\ INT\,\n\tserver_memory\ BIGINT\,\n\tsku\ NVARCHAR(64)\,\n\tengine_edition\ SMALLINT\,\n\thardware_type\ VARCHAR(16)\,\n\ttotal_storage_mb\ BIGINT\,\n\tavailable_storage_mb\ BIGINT\,\n\tuptime\ INT\n)\n\nIF\ SERVERPROPERTY('EngineEdition')\ \=\ 8\ \ --\ Managed\ Instance\n\ \tINSERT\ INTO\ @sys_info\ (\ cpu_count\,\ server_memory\,\ sku\,\ engine_edition\,\ hardware_type\,\ total_storage_mb\,\ available_storage_mb\,\ uptime\ )\n\tSELECT\ \tTOP(1)\n\t\t\tvirtual_core_count\ AS\ cpu_count\,\n\t\t\t(SELECT\ process_memory_limit_mb\ FROM\ sys.dm_os_job_object)\ AS\ server_memory\,\n\t\t\tsku\,\n\t\t\tcast(SERVERPROPERTY('EngineEdition')\ as\ smallint)\ AS\ engine_edition\,\n\t\t\thardware_generation\ AS\ hardware_type\,\n\t\t\treserved_storage_mb\ AS\ total_storage_mb\,\n\t\t\t(reserved_storage_mb\ -\ storage_space_used_mb)\ AS\ available_storage_mb\,\n\t\t\t(select\ DATEDIFF(MINUTE\,sqlserver_start_time\,GETDATE())\ from\ sys.dm_os_sys_info)\ as\ uptime\n\tFROM\tsys.server_resource_stats\n\tORDER\ BY\ start_time\ DESC\n\nIF\ SERVERPROPERTY('EngineEdition')\ \=\ 5\ \ --\ Azure\ SQL\ DB\n\tINSERT\ INTO\ @sys_info\ (\ cpu_count\,\ server_memory\,\ sku\,\ engine_edition\,\ hardware_type\,\ total_storage_mb\,\ available_storage_mb\,\ uptime\ )\n\tSELECT\ \tTOP(1)\n\t\t\t(SELECT\ count(*)\ FROM\ sys.dm_os_schedulers\ WHERE\ status\ \=\ 'VISIBLE\ ONLINE')\ AS\ cpu_count\,\n\t\t\t(SELECT\ process_memory_limit_mb\ FROM\ sys.dm_os_job_object)\ AS\ server_memory\,\n\t\t\tslo.edition\ as\ sku\,\n\t\t\tcast(SERVERPROPERTY('EngineEdition')\ as\ smallint)\ \ AS\ engine_edition\,\n\t\t\tslo.service_objective\ AS\ hardware_type\,\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ cast(DATABASEPROPERTYEX(DB_NAME()\,'MaxSizeInBytes')\ as\ bigint)/(1024*1024)\ \ AS\ total_storage_mb\,\n\t\t\tNULL\ AS\ available_storage_mb\,\ \ --\ Can\ we\ find\ out\ storage?\n\t\t\tNULL\ as\ uptime\n\tFROM\t\ sys.databases\ d\ \ \ \n\t\t--\ sys.databases.database_id\ may\ not\ match\ current\ DB_ID\ on\ Azure\ SQL\ DB\n\t\tCROSS\ JOIN\ sys.database_service_objectives\ slo\n\t\tWHERE\ d.name\ \=\ DB_NAME()\ AND\ slo.database_id\ \=\ DB_ID()\n\nELSE\nBEGIN\n\tINSERT\ INTO\ @sys_info\ (\ cpu_count\,\ server_memory\,\ sku\,\ engine_edition\,\ hardware_type\,\ total_storage_mb\,\ available_storage_mb\,\ uptime\ )\n\tSELECT\tcpu_count\,\n\t\t\t(SELECT\ total_physical_memory_kb\ FROM\ sys.dm_os_sys_memory)\ AS\ server_memory\,\n\t\t\tCAST(SERVERPROPERTY('Edition')\ AS\ NVARCHAR(64))\ as\ sku\,\n\t\t\tCAST(SERVERPROPERTY('EngineEdition')\ as\ smallint)\ as\ engine_edition\,\n\t\t\tCASE\ virtual_machine_type_desc\n\t\t\t\tWHEN\ 'NONE'\ THEN\ 'PHYSICAL\ Machine'\n\t\t\t\tELSE\ virtual_machine_type_desc\n\t\t\tEND\ AS\ hardware_type\,\n\t\t\tNULL\,\n\t\t\tNULL\,\n\t\t\t\ DATEDIFF(MINUTE\,sqlserver_start_time\,GETDATE())\n\tFROM\tsys.dm_os_sys_info\nEND\nSELECT\t'sqlserver_server_properties'\ AS\ [measurement]\,\n\t\tREPLACE(@@SERVERNAME\,'\'\,':')\ AS\ [sql_instance]\,\n\t\tDB_NAME()\ as\ [database_name]\,\n\t\ts.cpu_count\,\n\t\ts.server_memory\,\n\t\ts.sku\,\n\t\ts.engine_edition\,\n\t\ts.hardware_type\,\n\t\ts.total_storage_mb\,\n\t\ts.available_storage_mb\,\n\t\ts.uptime\,\n\t\tSERVERPROPERTY('ProductVersion')\ AS\ sql_version\,\n\t\tdb_online\,\n\t\tdb_restoring\,\n\t\tdb_recovering\,\n\t\tdb_recoveryPending\,\n\t\tdb_suspect\,\n\t\tdb_offline\nFROM\t(\n\t\t\tSELECT\tSUM(\ CASE\ WHEN\ state\ \=\ 0\ THEN\ 1\ ELSE\ 0\ END\ )\ AS\ db_online\,\n\t\t\t\t\tSUM(\ CASE\ WHEN\ state\ \=\ 1\ THEN\ 1\ ELSE\ 0\ END\ )\ AS\ db_restoring\,\n\t\t\t\t\tSUM(\ CASE\ WHEN\ state\ \=\ 2\ THEN\ 1\ ELSE\ 0\ END\ )\ AS\ db_recovering\,\n\t\t\t\t\tSUM(\ CASE\ WHEN\ state\ \=\ 3\ THEN\ 1\ ELSE\ 0\ END\ )\ AS\ db_recoveryPending\,\n\t\t\t\t\tSUM(\ CASE\ WHEN\ state\ \=\ 4\ THEN\ 1\ ELSE\ 0\ END\ )\ AS\ db_suspect\,\n\t\t\t\t\tSUM(\ CASE\ WHEN\ state\ \=\ 6\ or\ state\ \=\ 10\ THEN\ 1\ ELSE\ 0\ END\ )\ AS\ db_offline\n\t\t\tFROM\tsys.databases\n\t\t)\ AS\ dbs\n\t\tCROSS\ APPLY\ (\n\t\t\tSELECT\tcpu_count\,\ server_memory\,\ sku\,\ engine_edition\,\ hardware_type\,\ total_storage_mb\,\ available_storage_mb\,\ uptime\n\t\t\tFROM\t@sys_info\n\t\t)\ AS\ s\n,status=runnable,transaction_isolation_level=2-Read\ Committed,wait_resource=2:1:16 blocking_session_id=0i,cpu_time_ms=4i,granted_query_memory_pages=0i,logical_reads=187i,open_transaction=0i,percent_complete=0,request_id=0i,session_id=64i,total_elasped_time_ms=15i,wait_time_ms=0i,writes=1i 1580849430000000000

sawo1337 commented 4 years ago

Apparently, the statement is being parsed as a formatted SQL - with newlines, tabs, etc. This is why all that is being converted to \n, \t, etc. While in theory, you can convert the special characters back into structured SQL, I would strongly suggest against storing all this data into the database. At the very least, it makes things unreadable, also takes much more space than it would normally. If one wants formatted SQL, it would make much more sense to just reformat the SQL after examining the query. I've tested this and found a workaround to remove any extra characters from the statement text. on sqlserver.go, line 1431-1436 I've changed this:

    , (SUBSTRING(qt.text, r.statement_start_offset / 2 + 1,
                                            (CASE WHEN r.statement_end_offset = -1
                                                    THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
                                                    ELSE r.statement_end_offset
                                            END - r.statement_start_offset) / 2)
        ) AS statement_text

To this:

    , replace(replace(replace(replace(replace(replace((SUBSTRING(qt.text, r.statement_start_offset / 2 + 1,
                                            (CASE WHEN r.statement_end_offset = -1
                                                    THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
                                                    ELSE r.statement_end_offset
                                            END - r.statement_start_offset) / 2)
        ),Char(13),''),Char(10),''),Char(9),''),' ','<>'),'><',''),'<>',' ') AS statement_text

The above removes all extra white spaces, tabs, new lines etc.

danielnelson commented 4 years ago

Just linking into some related discussion: https://github.com/influxdata/telegraf/issues/6678#issuecomment-581878831.

danielnelson commented 4 years ago

This tag value definitely sticks out, we don't have any other tag values that are so large and with whitespace like this.

Additionally, as pointed out by @Trovalo on the community site, tag key values are limited to 64KiB in length. This seems like a strong argument in favor of switching this to a string field.

Trovalo commented 4 years ago

To me this is not a bug, Telegraf is just saving the string as-is, by encoding the chars that will otherwise get lost.

Yes, the output raw output is not nice to see, but even by removing the special chars you may not get a readable result.

IMO this kind of logic should be placed when reading the data and not while writing them to the database, below the reasons:

@danielnelson I'm curious to know if flux, which has the replace function (which is not present in InfluxQL) can actually return a multi-line strings

Summary IMO, this kind of logic belongs to the tool that reads the data and should not be applied while writing them.

Removing the encoded chars will just save a bit of space, but will also limit your options in managing the data later, since the only way to format that huge one-line string will be to format it somehow, without the possibility to use a simple replace function. Of course the function may not be included in the query language itself InfluxQL/Flux, but might be present in other reporting/dashboarding/analysis tools.

My Case In my case, I use Grafana as a dashboarding tool, and I can't format that string using InfluxQL, the only option left for me is to find/develop a plugin that is able to manage encoded chars or replace strings. That said, other tools like PowerBI, Qlik or any other BI/reporting tool (not so popular for this kind of data but still valid) that has a "data modeling" level will be able to handle this string in a nice way by simply using replace.

Not Related Note: The last char of the string is cut off for some reason

sawo1337 commented 4 years ago

\n, \t are going to be parsed as plain text by pretty much any tool that reads data from influxdb. This is incorrect in my opinion. Also if someone really wants to see a pretty output, they can use poor man's SQL formatter, for example. It is available as a plugin for SSMS, which is used by most SQL developers anyway. I don't see any tool in the InfluxData stack that would automatically parse this kind of data to formatted SQL as well. The other issue mentioned by @danielnelson is also valid, errors about tag key size limit are shown pretty often, which leads to some of the data being missed. Personally I've converted the tag to a field, I believe most people would have it this way than not.

danielnelson commented 4 years ago

I'm curious to know if flux, which has the replace function (which is not present in InfluxQL) can actually return a multi-line strings

Both Flux and InfluxQL should be able to return multi-line strings no problem, however I think Grafana and Chronograf would both have a hard time displaying this string since it is so far from how a tag is expected in terms of size/format.

danielnelson commented 4 years ago

It seems that there are two decisions to make:

I'm leaning towards yes on moving the tag to a field and no (at least for now) on removing whitespace formatting.

The downside of moving statement_text to a field is that you will no longer be able to group by the statement_text and instead you will need to use the query_hash. In order to determine what the query_hash is you will probably need to first look that up in a separate query.

I can't remember if we made query_has a tag already, if it is still a field then we will need to move it over to a tag in order to keep series identity.

We probably should rename the tags/fields if we swap them, otherwise it can be somewhat confusing to query when you have a tag and field with the same name.

Trovalo commented 4 years ago

Completely agree on moving "statement text" from tag to field.

The column "query_hash" is a tag since issue #6679, which forced its value from binary to string, therefore making it a tag automatically (for this plugin all string values ar tags).

About the "statement_text", @sawo1337 opinion is not wrong, if you want to format that text "manually" using copy-paste having a clean text is handy. Not sure if it's worth to make a parameter in the config for this, something like "clean_sql" = true | false. How you can manage this is really up to the tools you have as frontend.

At last, not grouping by "statement_text" to me is not an issue anyway, it is unreadable in charts, and in tables with more than 3-4 columns. I personally group by or filter by "sql hash" in any visual that use that measurement, if you want to see the query itself you need a dedicated visual. Personally I have a table with the "query hash" and its statistics, to identify expensive queries, then using a link I navigate to a "query detail" dashboard, which shows the data of a single "query hash" with some metrics and the "statement text" alone in its own table.

spaghettidba commented 4 years ago

Throwing in my 2 cents:

I would investigate the following:

  1. Is this a limitation in the database storage (can InfluxDB store newlines and other illegal characters)?
  2. Is this a limitation in the write protocol (can Telegraf send escaped strings to InfluxDB and have the escaped characters saved correctly?)
  3. Is this a limitation in the read protocol (can InfluxDB read data from a point that contains illegal characters and send this data to applications like Grafana?)
  4. Is this a limitation in the applications (does Grafana handle illegal characters correctly?)

All of these have different solutions, but messing with the string data with replace and the like is not one of them.

sawo1337 commented 4 years ago

Unescaped special characters are not allowed in the influxdb line protocol. You can get the plan handle id with the statement (there is a field for it), which would be the most definitive way of getting the exact plan you want to see.

spaghettidba commented 4 years ago

I see what you mean, but you're assuming that

  1. I can access the database (fair assumption, if I have to troubleshoot it)
  2. The plan is still in the cache (not guaranteed at all)
  3. The plan has ever been cached (again, not guaranteed. All trivial plans are not cached for instance)

I undestand that the line protocol disallows some characters, hence they're escaped. What I cannot understand is why those characters are not saved in the database correctly. The central question is whether the characters are escaped or replaced with something that looks like an escape sequence but it's not. If it's really an escape sequence, the database should contain the original character, which is not the case.

sawo1337 commented 4 years ago

In our environment, for example, we log the session data including the plan for anything that takes over 500ms for later investigation. I wouldn't use this plugin for anything but a pointer to the actual problem. Without the actual execution plan, most of the information is not that useful if you want to find the actual reason for the problem. For example, you know that you have a long-running query that uses CPU/io/memory, but then what? You still need to investigate the execution plan IMO. In this case, the special characters are stored as clear text, then escaped as well. The only thing that amounts to is to make the query unreadable and that's about it. For me, it is not that big of a deal, I can always recompile the plugin with the special characters removed, it is just that I don't see how can having them stored as plain text can benefit anyone.

danielnelson commented 4 years ago

@spaghettidba Unfortunately, it's not possible to encode all byte sequences using line protocol. In particular I don't think there is a way to send a tag value with a LF char. On the other hand, a string field could have newlines and tabs encoded without using replacement characters. For trivia purposes, other examples of unencodable data are the empty string and a value with a trailing space.

spaghettidba commented 4 years ago

@danielnelson Thanks for the info. I was under the impression that this should not be a tag, but a field. Did I misunderstand?

danielnelson commented 4 years ago

Currently Telegraf writes it as a tag and one of the points of discussion is to change it to be a field.

Trovalo commented 4 years ago

@spaghettidba Unfortunately, it's not possible to encode all byte sequences using line protocol. In particular, I don't think there is a way to send a tag value with a LF char. On the other hand, a string field could have newlines and tabs encoded without using replacement characters. For trivia purposes, other examples of unencodable data are the empty string and a value with a trailing space.

That answer intrigued me and I made some tests, by manually entering some values using Chronograf. Here is the LineProtocol string I passed in Chronograf: (use it as is)

Test,QueryId=1 QueryText="SELECT 
    ColumnA,
    ColumnB,
    ColumnC
FROM MyTable"
Test,QueryId=2 QueryText="SELECT 
    a.ColumnA,
    a.ColumnB,
    a.ColumnC,
    b.SomeThing
FROM MyTable AS a
INNER JOIN OtherTable AS b
ON a.Id = b.Id"

The result is then displayed correctly in Grafana by setting:

eeeee

In the end, it is possible to output those strings correctly if set as field and not encoded

As a note, I've found the below statement confusing in the Influx Line Protocol Docs https://docs.influxdata.com/influxdb/v1.7/write_protocols/line_protocol_reference/

Line protocol accepts the newline character \n and is whitespace-sensitive.

and immediately after

Note Line protocol does not support the newline character \n in tag values or field values.

In fact, as @danielnelson stated before, it is possible to use the newline char inside a string field

Trovalo commented 4 years ago

I've run some tests today, once "stetement_text" column is converted from tag to field the text in it will be interpreted correctly.

I've used the "converter" processor to achieve it.

sjwang90 commented 3 years ago

Closing this as @Trovalo has provided a workaround.

I've run some tests today, once "stetement_text" column is converted from tag to field the text in it will be interpreted correctly. I've used the "converter" processor to achieve it.

spaghettidba commented 3 years ago

I'm not sure that closing bugs because there's a workaround is a good idea. The SQL Server plugin by default wants to store all text data as tags, which leads to this issue. The mere existence of a workaround does not solve the bug IMHO

sawo1337 commented 3 years ago

Agreed, I don't see why not fix this instead of leaving it as closed with a workaround? There are no real benefits of keeping "statement_text" as a tag as per the discussion above.

sjwang90 commented 3 years ago

This is not a bug since it performing as expected, I am going to label this as a feature request instead and rename it to "change statement_text from a tag to a field".

Anyone please feel free to open a PR for this change.