marcingminski / sqlwatch

SQL Server Performance Monitor
https://docs.sqlwatch.io
Other
428 stars 168 forks source link

There are too much _dump_ tables. #394

Closed sozezzo closed 2 years ago

sozezzo commented 3 years ago

Did you check DOCS to make sure there is no workaround? https://sqlwatch.io/docs/ image

Describe the bug There are too much dump tables. image

image

To Reproduce By now, this happens in the Central Repository.

Expected behavior Drop DUMP tables.

SQLWATCH version (from DACPAC or from sysinstances) image

Question : Can we just delete it ?

marcingminski commented 3 years ago

When the import goes wrong, the SqlWatchImport.exe application dumps the data into the "_DUMP" table. This is to be able to see the data that has failed to import to help with debugging. You can disable this behavior in the app.config: <add key="DumpDataOnError" value="true"/> set this to false: <add key="DumpDataOnError" value="false"/>

Yes you can drop them.

I would however investigate why the import goes wrong so often.

sozezzo commented 2 years ago

After a deep analysis, some months, many coffees, I found a workaround:

I changed Max threads configuration in "app.config" file.

<add key="MaxThreads" value="5" />

Finally, no more "DUMP" tables.

marcingminski commented 2 years ago

What were the errors in the log?

sozezzo commented 2 years ago

It was not too much information in log file:

2021-06-04 08:50:29.094 ERROR Failed to merge table "[Labo-SQL03].dbo.sqlwatch_logger_snapshot_header"
                        at Transaction (Process ID 60) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
                        ;merge dbo.sqlwatch_logger_snapshot_header as target using [#dbo.sqlwatch_logger_snapshot_header] as source
                            on (source.[snapshot_time] = target.[snapshot_time] and source.[sql_instance] = target.[sql_instance] and source.[snapshot_type_id] = target.[snapshot_type_id])
                            when not matched
                            then insert ([snapshot_time],[snapshot_type_id],[sql_instance],[report_time],[snapshot_time_utc_offset])
                            values (source.[snapshot_time],source.[snapshot_type_id],source.[sql_instance],source.[report_time],source.[snapshot_time_utc_offset]); (Thread: 223)
2021-06-04 08:50:29.137  Finished: "Labo-SQL03". Time taken: 4830.7958ms 

I was forced to create a job to delete the DUMP tables with this script.

declare @sql varchar(max) = '';
select top 50 @sql = @sql + 'drop table if exists ['+[name]+'];
' from sys.tables where [name] like '_DUMP_%'
print @sql
exec(@sql);

It was more 500 Dump tables by week. By now, after the config change, no more DUMP tables.

Now, I am modifying the SqlWatchImport :

    • It will run in a loop, with a short delay between each loop. (Okay)
    • use tracking change. (testing)

by the way, thanks, it's nice job you have done.

marcingminski commented 2 years ago

Thank you for the kind words. The deadlock on the snapshot header was fixed recently. Which version are you using?

marcingminski commented 2 years ago

You may want to look at version 5.0, there is a data collector now that does not require remote deployment at all. Although the release is not quite finished yet the collector did ok in my testing. Although remember, I am learning c# so I am sure there is big room for improvement.

sozezzo commented 2 years ago

I tried to use the version 5.0! But I had some problems:

image

** Broker service is OK, Queues are OK.

SqlWatchCollect.zip

marcingminski commented 2 years ago

The repository database name is Harcoded "SQLWatch" in dashboard inside Dashboard Grafana (Json).

yep, that’s a bug. Apologies

The repository stops collecting data from itself when I run "SqlWatchCollect.exe".

That’s by design as SqlWatchCollect.exe should collect data from the centra repo too. But I understand it does not? I’ll look into it. Thanks

thanks for the log, I’ll take a look

sozezzo commented 2 years ago

I took my time to debug the SqlWatchCollect.

You said: " does not require remote deployment at all." BUT .. what are these codes?

image

You make a "soft" deployment.


The bug that appears in the log. It is because some the "functions" are missing. ufn_sqlwatch_create_hash

Functions - deployment image

image

Used functions at "PROCEDURE [dbo].[usp_sqlwatch_internal_get_data_collection_snapshot_xml]" image

You copy the code from the repository database: tables, procedures, functions. And you deploy on TempDb database.

But your code becomes complicated for nothing, with too much code manipulation, and dynamic code, The names of the tables, functions, and stored procedures are hardcoded into the code. The content of the code is dynamic, but the names are not.

I think of a much simpler solution:

marcingminski commented 2 years ago

Whilst it's not a final version, some objects will still be deployed into the tempdb for performance reasons. Every monitoring solution I know does this, even those that cost $3000 per instance. Some data is still being saved locally for local delta calculations and to avoid pulling it over the network when not required.

You copy the code from the repository database: tables, procedures, functions. And you deploy on TempDb database.

Yep. Its simple and relatively easy to do as I already have all the code in the SQLWATCH database.

All code to be executed remotely is saved in a table.

Nope, you lose version control. And what would execute this code? This would be much more complicated and less performance. The application would have to query the table every 5 seconds to get the code to execute, then execute it. With my approach, the application runs a stored procedure. Storing code in the table is a really bad approach.

The SQLWatchCollect only selects the scripts to be executed, no manipulation of code.

You lose performance and version control.

You can add, remove, change code, without changing the SQLWatchCollect, you change the table scripts.

Again, you lose version control. Poor approach

Use a global temporary table, and do not create tables in the TempDB database.

Nope, that would wipe out any data upon restart or when the connection is dropped. Deploying these objects could take few seconds so we don't want to do that too often on unstable networks.

Do not create Stored Procedures, or Functions.

Why?

Using a table with the code to be executed, we can have SQL Scripts for each SQL server version.

Again, no version control. Not a good approach.

Do not delete any table, we can check if the table structure is correct.

Yes, that would be good although more difficult to do.

sozezzo commented 2 years ago

Nope, you lose version control. And what would execute this code? This would be much more complicated and less performance. The application would have to query the table every 5 seconds to get the code to execute, then execute it. With my approach, the application runs a stored procedure. Storing code in the table is a really bad approach.

Yes.. that's true, you lose version control with the code in a table. The problem is that we already do that, with these tables:

We delete all table when we restart the application SQLWatchCollect: image

We can save the script to run from application's memory. The stored procedures use dynamic code, each time, it create same code to run. We can send the script to run, and instead of to call a stored procedure to create it and run.

Use a global temporary table, and do not create tables in the TempDB database. When we create tables in TempDB, we left garbage in TempDB. If you really don't want to deploy any code, create temp tables and just run scripts.

Use case when we cannot add tables or scripts:

Do not delete any table, we can check if the table structure is correct. I will send you a function to do that.

I still love SQLWatch solution, I can see what's running and understand it. Until today, I still don't monitor Clusters, dedicated servers or critical servers. I write you, it is because SQLWatch is the best solution.

marcingminski commented 2 years ago

Hi,

The problem is that we already do that, with these tables: [sqlwatch_config_action] [sqlwatch_config_check_action_template] [sqlwatch_config_check_template] [sqlwatch_meta_check]

These tables do not hold any logic so no need to version control these. Since SQLWATCH is a monitoring solution, the majority of the "business" logic is the performance queries. I want them in the code, not in the table. Moreover, "meta" tables are populated by the application so we shouldn't even care about them.

We delete all table when we restart the application SQLWatchCollect:

True, but that's because I did not know how to recover from a failure. When testing with lots of small instances as described here: https://sqlwatch.io/blog/how-i-test-sqlwatch-on-100-sql-server-instances-without-breaking-the-bank/ they often timed out breaking the connection. This was the quickest way for me to make it run. Ideally, I want to be able to recover from a broken connection but I am not quite there yet. You are free to help if you like.

We can save the script to run from application's memory.

Yes true. I tried that and I liked it. It was done as part of the initialisation process where it would load all T-SQL into a hash table as far as I remember. There is one important constraint, the application is optional. SQLWATCH must still work and collect data as a standalone database without the application. In which case, it would have to query the table every time it wants to collect the data, adding additional performance overhead to the monitored instance. I tried it and did not like the overhead or the fact that T-SQL was saved in a table.

Use case when we cannot add tables or scripts:

That's why they go into tempdb - a database designed for temporary object creation, a lot of applications do that. If you have any other commercial and paid monitoring solution they do exactly the same.

Without tables on the remote instance, how would you run a query that excludes or includes some performance counters? Would you dynamically build a T-SQL query in C# with a long WHERE clause that would potentially bloat the cache on the production instance? A simple small table with a simple query and a small join is a much cleaner and faster approach. How would you then run the same code without the C# application in the SQLWATCH database? You would end up with two different versions of the same code

I write you, it is because SQLWatch is the best solution.

Thank you, appreciate the kind words! :-)

Appreciate your feedback, I do not dismiss any of it and some of what you suggest may happen in future versions once I worked out all the dependencies. I am by no means saying that the application is perfect - far from it but given the constraints, I think it's the best option and remember, this is a very first release, not a final version.