microsoft / bobsql

demos, scripts, samples, and code from the two bobs who work at Microsoft on SQL Server
MIT License
250 stars 145 forks source link

Performance optimization of populatedata.sql and new step in readme.md #34

Closed SeanCowburnMSFT closed 2 years ago

SeanCowburnMSFT commented 2 years ago

Added -S to osstress call. Included extra step in readme.md to make a change to server name in workload_index_scan_users.cmd Changed populatedata.sql to improve performance by using smaller transactions (less logfile growth) and showcase the new GENERATE_SERIES relational operator.

SeanCowburnMSFT commented 2 years ago

Not sure what you mean by the "first set of inserts" - the code change still inserts 20 million records, but in batches of 50000. The original transaction took ~19 minutes on my laptop and the changed code runs in ~10 minutes, so I thought that would warrant a change. Happy to give it another go, if you see room for improvement.

rgward commented 2 years ago

Sean,

The original script adds in about 4m rows for two different supplier ID values (4 and 5). Looks like your script adds in 20M rows for supplier ID 4. I wanted to know skew of data across two supplier ID values. Plus I don't want to add in 20M rows because it will take the scans too long for the demo. Interesting that it took 19 mins on your laptop as it has consistently run for me in about 10 mins.

SeanCowburnMSFT commented 2 years ago

Hi Bob,

I think we may be looking at two different pieces of code. This is the code that I pulled from master in your repo:

USE WideWorldImporters; GO

DECLARE @StockItemID int DECLARE @StockItemName varchar(100) DECLARE @SupplierID int SELECT @StockItemID = 228 SET @StockItemName = 'Dallas Cowboys Shirt'+convert(varchar(10), @StockItemID) SET @SupplierID = 4 DELETE FROM Warehouse.StockItems WHERE StockItemID >= @StockItemID SET NOCOUNT ON BEGIN TRANSACTION WHILE @StockItemID <= 20000000 BEGIN INSERT INTO Warehouse.StockItems (StockItemID, StockItemName, SupplierID, UnitPackageID, OuterPackageID, LeadTimeDays, QuantityPerOuter, IsChillerStock, TaxRate, UnitPrice, TypicalWeightPerUnit, LastEditedBy ) VALUES (@StockItemID, @StockItemName, @SupplierID, 10, 9, 12, 100, 0, 15.00, 100.00, 0.300, 1) SET @StockItemID = @StockItemID + 1 SET @StockItemName = 'Dallas Cowboys Shirt'+convert(varchar(10), @StockItemID) END COMMIT TRANSACTION SET NOCOUNT OFF GO

I don't see where the variable @SupplierID is changed anywhere in the code. From what I can read here, there are 20M rows being inserted for SupplierID = 4. Is it possible that you are working with a branch that hasn't been pushed yet?

SeanCowburnMSFT commented 2 years ago

As a sanity check, I just reran the code above on a freshly restored WWI database. There were 19999847 rows inserted for SupplierID 4.

Runtime was 16:32.

rgward commented 2 years ago

I think you may have caught the master branch for I changed it. That scrip in master looks like this: https://github.com/microsoft/bobsql/blob/master/demos/sqlserver2022/IQP/psp/populatedata.sql

USE WideWorldImporters; GO -- Add StockItems to cause a data skew in Suppliers

DECLARE @StockItemID int DECLARE @StockItemName varchar(100) DECLARE @SupplierID int SELECT @StockItemID = 228 SET @StockItemName = 'Dallas Cowboys Shirt'+convert(varchar(10), @StockItemID) SET @SupplierID = 4 DELETE FROM Warehouse.StockItems WHERE StockItemID >= @StockItemID SET NOCOUNT ON BEGIN TRANSACTION WHILE @StockItemID <= 4000000 BEGIN INSERT INTO Warehouse.StockItems (StockItemID, StockItemName, SupplierID, UnitPackageID, OuterPackageID, LeadTimeDays, QuantityPerOuter, IsChillerStock, TaxRate, UnitPrice, TypicalWeightPerUnit, LastEditedBy ) VALUES (@StockItemID, @StockItemName, @SupplierID, 10, 9, 12, 100, 0, 15.00, 100.00, 0.300, 1) SET @StockItemID = @StockItemID + 1 SET @StockItemName = 'Dallas Cowboys Shirt'+convert(varchar(10), @StockItemID) END COMMIT TRANSACTION SET NOCOUNT OFF GO DECLARE @StockItemID int DECLARE @StockItemName varchar(100) DECLARE @SupplierID int SELECT @StockItemID = 4000001 SET @StockItemName = 'Dallas Cowboys Mug'+convert(varchar(10), @StockItemID) SET @SupplierID = 5 DELETE FROM Warehouse.StockItems WHERE StockItemID >= @StockItemID SET NOCOUNT ON BEGIN TRANSACTION WHILE @StockItemID <= 8000000 BEGIN INSERT INTO Warehouse.StockItems (StockItemID, StockItemName, SupplierID, UnitPackageID, OuterPackageID, LeadTimeDays, QuantityPerOuter, IsChillerStock, TaxRate, UnitPrice, TypicalWeightPerUnit, LastEditedBy ) VALUES (@StockItemID, @StockItemName, @SupplierID, 10, 9, 12, 100, 0, 15.00, 100.00, 0.300, 1) SET @StockItemID = @StockItemID + 1 SET @StockItemName = 'Dallas Cowboys Mug'+convert(varchar(10), @StockItemID) END COMMIT TRANSACTION SET NOCOUNT OFF GO FooterMicrosoft Open Source Docs Portal GitHub @ MSFT Support Options Release Guide Microsoft Open Source Microsoft Open Source Docs Portal GitHub @ MSFT Support Options Release Guide © 2022 GitHub, Inc. Footer navigation Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About

rgward commented 2 years ago

Sorry Sean for the confusion. I'm not sure what you mean by just ran the code. Which code? You can see the current master branch is designed to add in ~4M rows for supplier ID 4 and 5. As I said before, I don't want the demo to use 20M. You are free to publish your own version of these demos. You can email me at bobward@microsoft.com if that is easier.

SeanCowburnMSFT commented 2 years ago

I'll ping you tomorrow (it's 11PM here ;-) ), Bob. Perhaps we can have a quick call to clarify. When I look at the code in master for dopfeedback, I see the code I posted above.

rgward commented 2 years ago

Sean, I see the problem and its all on me. I use some of these script names for the different demos. I was not playing close attention to your PR so my apologies. I kept thinking you were working on the PSP demo. Doh! This is for the dopfeedbackdemo. Let me test out your script for populatedata.sql for that demo and get back to you

SeanCowburnMSFT commented 2 years ago

No need to apologize - we're on the same team! Thought I'd lost it for a short while. Let me know if you'd like me to have a go at the code for populating in the psp demo. Happy to help where I can.

Have a good one!

rgward commented 2 years ago

Sean, because GENERATE_SERIES now requires dbcompat 160 I won't accept this PR. I'll look into batching the inserts to see if it speeds up the populate script.