Open sawo1337 opened 4 years ago
Can you attach the output of telegraf --input-filter sqlserver --test
? Just one line from the output that shows how we are writing the metrics?
Sure:
sqlserver_requests,command=AWAITING\ COMMAND,database_name=tempdb,host=HOST,host_name=HOST,program_name=telegraf,session_db_name=tempdb,sql_instance=SERVER,statement_text=SET\ DEADLOCK_PRIORITY\ -10;\nDECLARE\ @sys_info\ TABLE\ (\n\tcpu_count\ INT\,\n\tserver_memory\ BIGINT\,\n\tsku\ NVARCHAR(64)\,\n\tengine_edition\ SMALLINT\,\n\thardware_type\ VARCHAR(16)\,\n\ttotal_storage_mb\ BIGINT\,\n\tavailable_storage_mb\ BIGINT\,\n\tuptime\ INT\n)\n\nIF\ SERVERPROPERTY('EngineEdition')\ \=\ 8\ \ --\ Managed\ Instance\n\ \tINSERT\ INTO\ @sys_info\ (\ cpu_count\,\ server_memory\,\ sku\,\ engine_edition\,\ hardware_type\,\ total_storage_mb\,\ available_storage_mb\,\ uptime\ )\n\tSELECT\ \tTOP(1)\n\t\t\tvirtual_core_count\ AS\ cpu_count\,\n\t\t\t(SELECT\ process_memory_limit_mb\ FROM\ sys.dm_os_job_object)\ AS\ server_memory\,\n\t\t\tsku\,\n\t\t\tcast(SERVERPROPERTY('EngineEdition')\ as\ smallint)\ AS\ engine_edition\,\n\t\t\thardware_generation\ AS\ hardware_type\,\n\t\t\treserved_storage_mb\ AS\ total_storage_mb\,\n\t\t\t(reserved_storage_mb\ -\ storage_space_used_mb)\ AS\ available_storage_mb\,\n\t\t\t(select\ DATEDIFF(MINUTE\,sqlserver_start_time\,GETDATE())\ from\ sys.dm_os_sys_info)\ as\ uptime\n\tFROM\tsys.server_resource_stats\n\tORDER\ BY\ start_time\ DESC\n\nIF\ SERVERPROPERTY('EngineEdition')\ \=\ 5\ \ --\ Azure\ SQL\ DB\n\tINSERT\ INTO\ @sys_info\ (\ cpu_count\,\ server_memory\,\ sku\,\ engine_edition\,\ hardware_type\,\ total_storage_mb\,\ available_storage_mb\,\ uptime\ )\n\tSELECT\ \tTOP(1)\n\t\t\t(SELECT\ count(*)\ FROM\ sys.dm_os_schedulers\ WHERE\ status\ \=\ 'VISIBLE\ ONLINE')\ AS\ cpu_count\,\n\t\t\t(SELECT\ process_memory_limit_mb\ FROM\ sys.dm_os_job_object)\ AS\ server_memory\,\n\t\t\tslo.edition\ as\ sku\,\n\t\t\tcast(SERVERPROPERTY('EngineEdition')\ as\ smallint)\ \ AS\ engine_edition\,\n\t\t\tslo.service_objective\ AS\ hardware_type\,\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ cast(DATABASEPROPERTYEX(DB_NAME()\,'MaxSizeInBytes')\ as\ bigint)/(1024*1024)\ \ AS\ total_storage_mb\,\n\t\t\tNULL\ AS\ available_storage_mb\,\ \ --\ Can\ we\ find\ out\ storage?\n\t\t\tNULL\ as\ uptime\n\tFROM\t\ sys.databases\ d\ \ \ \n\t\t--\ sys.databases.database_id\ may\ not\ match\ current\ DB_ID\ on\ Azure\ SQL\ DB\n\t\tCROSS\ JOIN\ sys.database_service_objectives\ slo\n\t\tWHERE\ d.name\ \=\ DB_NAME()\ AND\ slo.database_id\ \=\ DB_ID()\n\nELSE\nBEGIN\n\tINSERT\ INTO\ @sys_info\ (\ cpu_count\,\ server_memory\,\ sku\,\ engine_edition\,\ hardware_type\,\ total_storage_mb\,\ available_storage_mb\,\ uptime\ )\n\tSELECT\tcpu_count\,\n\t\t\t(SELECT\ total_physical_memory_kb\ FROM\ sys.dm_os_sys_memory)\ AS\ server_memory\,\n\t\t\tCAST(SERVERPROPERTY('Edition')\ AS\ NVARCHAR(64))\ as\ sku\,\n\t\t\tCAST(SERVERPROPERTY('EngineEdition')\ as\ smallint)\ as\ engine_edition\,\n\t\t\tCASE\ virtual_machine_type_desc\n\t\t\t\tWHEN\ 'NONE'\ THEN\ 'PHYSICAL\ Machine'\n\t\t\t\tELSE\ virtual_machine_type_desc\n\t\t\tEND\ AS\ hardware_type\,\n\t\t\tNULL\,\n\t\t\tNULL\,\n\t\t\t\ DATEDIFF(MINUTE\,sqlserver_start_time\,GETDATE())\n\tFROM\tsys.dm_os_sys_info\nEND\nSELECT\t'sqlserver_server_properties'\ AS\ [measurement]\,\n\t\tREPLACE(@@SERVERNAME\,'\'\,':')\ AS\ [sql_instance]\,\n\t\tDB_NAME()\ as\ [database_name]\,\n\t\ts.cpu_count\,\n\t\ts.server_memory\,\n\t\ts.sku\,\n\t\ts.engine_edition\,\n\t\ts.hardware_type\,\n\t\ts.total_storage_mb\,\n\t\ts.available_storage_mb\,\n\t\ts.uptime\,\n\t\tSERVERPROPERTY('ProductVersion')\ AS\ sql_version\,\n\t\tdb_online\,\n\t\tdb_restoring\,\n\t\tdb_recovering\,\n\t\tdb_recoveryPending\,\n\t\tdb_suspect\,\n\t\tdb_offline\nFROM\t(\n\t\t\tSELECT\tSUM(\ CASE\ WHEN\ state\ \=\ 0\ THEN\ 1\ ELSE\ 0\ END\ )\ AS\ db_online\,\n\t\t\t\t\tSUM(\ CASE\ WHEN\ state\ \=\ 1\ THEN\ 1\ ELSE\ 0\ END\ )\ AS\ db_restoring\,\n\t\t\t\t\tSUM(\ CASE\ WHEN\ state\ \=\ 2\ THEN\ 1\ ELSE\ 0\ END\ )\ AS\ db_recovering\,\n\t\t\t\t\tSUM(\ CASE\ WHEN\ state\ \=\ 3\ THEN\ 1\ ELSE\ 0\ END\ )\ AS\ db_recoveryPending\,\n\t\t\t\t\tSUM(\ CASE\ WHEN\ state\ \=\ 4\ THEN\ 1\ ELSE\ 0\ END\ )\ AS\ db_suspect\,\n\t\t\t\t\tSUM(\ CASE\ WHEN\ state\ \=\ 6\ or\ state\ \=\ 10\ THEN\ 1\ ELSE\ 0\ END\ )\ AS\ db_offline\n\t\t\tFROM\tsys.databases\n\t\t)\ AS\ dbs\n\t\tCROSS\ APPLY\ (\n\t\t\tSELECT\tcpu_count\,\ server_memory\,\ sku\,\ engine_edition\,\ hardware_type\,\ total_storage_mb\,\ available_storage_mb\,\ uptime\n\t\t\tFROM\t@sys_info\n\t\t)\ AS\ s\n,status=runnable,transaction_isolation_level=2-Read\ Committed,wait_resource=2:1:16 blocking_session_id=0i,cpu_time_ms=4i,granted_query_memory_pages=0i,logical_reads=187i,open_transaction=0i,percent_complete=0,request_id=0i,session_id=64i,total_elasped_time_ms=15i,wait_time_ms=0i,writes=1i 1580849430000000000
Apparently, the statement is being parsed as a formatted SQL - with newlines, tabs, etc. This is why all that is being converted to \n, \t, etc. While in theory, you can convert the special characters back into structured SQL, I would strongly suggest against storing all this data into the database. At the very least, it makes things unreadable, also takes much more space than it would normally. If one wants formatted SQL, it would make much more sense to just reformat the SQL after examining the query. I've tested this and found a workaround to remove any extra characters from the statement text. on sqlserver.go, line 1431-1436 I've changed this:
, (SUBSTRING(qt.text, r.statement_start_offset / 2 + 1,
(CASE WHEN r.statement_end_offset = -1
THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
ELSE r.statement_end_offset
END - r.statement_start_offset) / 2)
) AS statement_text
To this:
, replace(replace(replace(replace(replace(replace((SUBSTRING(qt.text, r.statement_start_offset / 2 + 1,
(CASE WHEN r.statement_end_offset = -1
THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
ELSE r.statement_end_offset
END - r.statement_start_offset) / 2)
),Char(13),''),Char(10),''),Char(9),''),' ','<>'),'><',''),'<>',' ') AS statement_text
The above removes all extra white spaces, tabs, new lines etc.
Just linking into some related discussion: https://github.com/influxdata/telegraf/issues/6678#issuecomment-581878831.
This tag value definitely sticks out, we don't have any other tag values that are so large and with whitespace like this.
Additionally, as pointed out by @Trovalo on the community site, tag key values are limited to 64KiB in length. This seems like a strong argument in favor of switching this to a string field.
To me this is not a bug, Telegraf is just saving the string as-is, by encoding the chars that will otherwise get lost.
Yes, the output raw output is not nice to see, but even by removing the special chars you may not get a readable result.
IMO this kind of logic should be placed when reading the data and not while writing them to the database, below the reasons:
@danielnelson I'm curious to know if flux, which has the replace function (which is not present in InfluxQL) can actually return a multi-line strings
Summary IMO, this kind of logic belongs to the tool that reads the data and should not be applied while writing them.
Removing the encoded chars will just save a bit of space, but will also limit your options in managing the data later, since the only way to format that huge one-line string will be to format it somehow, without the possibility to use a simple replace function. Of course the function may not be included in the query language itself InfluxQL/Flux, but might be present in other reporting/dashboarding/analysis tools.
My Case In my case, I use Grafana as a dashboarding tool, and I can't format that string using InfluxQL, the only option left for me is to find/develop a plugin that is able to manage encoded chars or replace strings. That said, other tools like PowerBI, Qlik or any other BI/reporting tool (not so popular for this kind of data but still valid) that has a "data modeling" level will be able to handle this string in a nice way by simply using replace.
Not Related Note: The last char of the string is cut off for some reason
\n, \t are going to be parsed as plain text by pretty much any tool that reads data from influxdb. This is incorrect in my opinion. Also if someone really wants to see a pretty output, they can use poor man's SQL formatter, for example. It is available as a plugin for SSMS, which is used by most SQL developers anyway. I don't see any tool in the InfluxData stack that would automatically parse this kind of data to formatted SQL as well. The other issue mentioned by @danielnelson is also valid, errors about tag key size limit are shown pretty often, which leads to some of the data being missed. Personally I've converted the tag to a field, I believe most people would have it this way than not.
I'm curious to know if flux, which has the replace function (which is not present in InfluxQL) can actually return a multi-line strings
Both Flux and InfluxQL should be able to return multi-line strings no problem, however I think Grafana and Chronograf would both have a hard time displaying this string since it is so far from how a tag is expected in terms of size/format.
It seems that there are two decisions to make:
I'm leaning towards yes on moving the tag to a field and no (at least for now) on removing whitespace formatting.
The downside of moving statement_text to a field is that you will no longer be able to group by the statement_text and instead you will need to use the query_hash. In order to determine what the query_hash is you will probably need to first look that up in a separate query.
I can't remember if we made query_has a tag already, if it is still a field then we will need to move it over to a tag in order to keep series identity.
We probably should rename the tags/fields if we swap them, otherwise it can be somewhat confusing to query when you have a tag and field with the same name.
Completely agree on moving "statement text" from tag to field.
The column "query_hash" is a tag since issue #6679, which forced its value from binary to string, therefore making it a tag automatically (for this plugin all string values ar tags).
About the "statement_text", @sawo1337 opinion is not wrong, if you want to format that text "manually" using copy-paste having a clean text is handy. Not sure if it's worth to make a parameter in the config for this, something like "clean_sql" = true | false. How you can manage this is really up to the tools you have as frontend.
At last, not grouping by "statement_text" to me is not an issue anyway, it is unreadable in charts, and in tables with more than 3-4 columns. I personally group by or filter by "sql hash" in any visual that use that measurement, if you want to see the query itself you need a dedicated visual. Personally I have a table with the "query hash" and its statistics, to identify expensive queries, then using a link I navigate to a "query detail" dashboard, which shows the data of a single "query hash" with some metrics and the "statement text" alone in its own table.
Throwing in my 2 cents:
I would investigate the following:
All of these have different solutions, but messing with the string data with replace and the like is not one of them.
Unescaped special characters are not allowed in the influxdb line protocol. You can get the plan handle id with the statement (there is a field for it), which would be the most definitive way of getting the exact plan you want to see.
I see what you mean, but you're assuming that
I undestand that the line protocol disallows some characters, hence they're escaped. What I cannot understand is why those characters are not saved in the database correctly. The central question is whether the characters are escaped or replaced with something that looks like an escape sequence but it's not. If it's really an escape sequence, the database should contain the original character, which is not the case.
In our environment, for example, we log the session data including the plan for anything that takes over 500ms for later investigation. I wouldn't use this plugin for anything but a pointer to the actual problem. Without the actual execution plan, most of the information is not that useful if you want to find the actual reason for the problem. For example, you know that you have a long-running query that uses CPU/io/memory, but then what? You still need to investigate the execution plan IMO. In this case, the special characters are stored as clear text, then escaped as well. The only thing that amounts to is to make the query unreadable and that's about it. For me, it is not that big of a deal, I can always recompile the plugin with the special characters removed, it is just that I don't see how can having them stored as plain text can benefit anyone.
@spaghettidba Unfortunately, it's not possible to encode all byte sequences using line protocol. In particular I don't think there is a way to send a tag value with a LF char. On the other hand, a string field could have newlines and tabs encoded without using replacement characters. For trivia purposes, other examples of unencodable data are the empty string and a value with a trailing space.
@danielnelson Thanks for the info. I was under the impression that this should not be a tag, but a field. Did I misunderstand?
Currently Telegraf writes it as a tag and one of the points of discussion is to change it to be a field.
@spaghettidba Unfortunately, it's not possible to encode all byte sequences using line protocol. In particular, I don't think there is a way to send a tag value with a LF char. On the other hand, a string field could have newlines and tabs encoded without using replacement characters. For trivia purposes, other examples of unencodable data are the empty string and a value with a trailing space.
That answer intrigued me and I made some tests, by manually entering some values using Chronograf. Here is the LineProtocol string I passed in Chronograf: (use it as is)
Test,QueryId=1 QueryText="SELECT
ColumnA,
ColumnB,
ColumnC
FROM MyTable"
Test,QueryId=2 QueryText="SELECT
a.ColumnA,
a.ColumnB,
a.ColumnC,
b.SomeThing
FROM MyTable AS a
INNER JOIN OtherTable AS b
ON a.Id = b.Id"
The result is then displayed correctly in Grafana by setting:
In the end, it is possible to output those strings correctly if set as field and not encoded
As a note, I've found the below statement confusing in the Influx Line Protocol Docs https://docs.influxdata.com/influxdb/v1.7/write_protocols/line_protocol_reference/
Line protocol accepts the newline character \n and is whitespace-sensitive.
and immediately after
Note Line protocol does not support the newline character \n in tag values or field values.
In fact, as @danielnelson stated before, it is possible to use the newline char inside a string field
I've run some tests today, once "stetement_text" column is converted from tag to field the text in it will be interpreted correctly.
I've used the "converter" processor to achieve it.
Closing this as @Trovalo has provided a workaround.
I've run some tests today, once "stetement_text" column is converted from tag to field the text in it will be interpreted correctly. I've used the "converter" processor to achieve it.
I'm not sure that closing bugs because there's a workaround is a good idea. The SQL Server plugin by default wants to store all text data as tags, which leads to this issue. The mere existence of a workaround does not solve the bug IMHO
Agreed, I don't see why not fix this instead of leaving it as closed with a workaround? There are no real benefits of keeping "statement_text" as a tag as per the discussion above.
This is not a bug since it performing as expected, I am going to label this as a feature request instead and rename it to "change statement_text from a tag to a field".
Anyone please feel free to open a PR for this change.
Relevant telegraf.conf:
default sqlserver setup, query_version=2
System info:
Telegraf 1.13.2 for Ubuntu SQL Server 2017 server
Steps to reproduce:
Expected behavior:
data to be stored without escape characters, for example:
select @par = parfrom [server].[db].dbo.table where id = @p_par_id and ([dbid] = @p_db_id or [db_id] = 0
Actual behavior:
data is stored with \n, \t, \r characters:
select @par = par\r\n\t\tfrom [server].[db].dbo.table\r\n\t\twhere id = @p_par_id and \r\n\t\t\t([dbid] = @p_db_id or [db_id] = 0