microsoft / azuredatastudio

Azure Data Studio is a data management and development tool with connectivity to popular cloud and on-premises databases. Azure Data Studio supports Windows, macOS, and Linux, with immediate capability to connect to Azure SQL and SQL Server. Browse the extension library for more database support options including MySQL, PostgreSQL, and MongoDB.
https://learn.microsoft.com/sql/azure-data-studio
MIT License
7.58k stars 903 forks source link

Opening of SQL Database Project is slow when it has 5000 sql objects or more #24480

Open SrivenNookala opened 1 year ago

SrivenNookala commented 1 year ago

Type: Performance Issue

Adding 5000 script objects to the SQL DB project (SDK-style) slows down the project heavily. Opening the project takes upwards of 30 secs or more and the ADS UI is unresponsive whether opening the project from UI or through API provided. Problem discovery is also slow and takes another 30 secs.

Attaching screenshots of DSCT conversion converting the same number of objects and adding them to a project that contains 5000+ objects and a project that contains a few hundred objects.

EDIT: Adding more context here

Problem discovery in DSCT works something like this: After adding the converted SQL scripts to sqlproj, we go through all the files of the sqlproj and parse all the errors generated by DSCT and use the vscode API to show these errors in the Problems tab next to Output.

Naturally, as the number of files increases, parsing the files and scraping the errors from them and displaying it on the UI through the API will take more time

The reason I highlighted it in the screenshots is because it provided a good estimate of the time it takes to loop through all the files and parse them as well.

Adding files to the sqlproj takes about 3-6x time it takes to discover problems which is unusual.

DSCT conversion of a schema with 15 objects being added to the SQL DB project that has 5000+ objects.

image

DSCT conversion of the same schema with 15 objects being added to the SQL DB project that has a few hundred objects.

image

Extension version: 1.2.0 Azure Data Studio version: azuredatastudio 1.45.1 (88c21b1725a3e79440027bdb7b5a55fb036be0e2, 2023-08-03T00:42:37.945Z) OS version: Windows_NT x64 10.0.22621 Restricted Mode: No Preview Features: Disabled Modes:

System Info |Item|Value| |---|---| |CPUs|11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz (8 x 1805)| |GPU Status|2d_canvas: enabled
canvas_oop_rasterization: disabled_off
direct_rendering_display_compositor: disabled_off_ok
gpu_compositing: enabled
multiple_raster_threads: enabled_on
opengl: enabled_on
rasterization: enabled
raw_draw: disabled_off_ok
video_decode: enabled
video_encode: enabled
vulkan: disabled_off
webgl: enabled
webgl2: enabled
webgpu: enabled| |Load (avg)|undefined| |Memory (System)|31.73GB (13.26GB free)| |Process Argv|--ms-enable-electron-run-as-node --debugBrkPluginHost=26794 --debugId=3dea85e5-3f8b-4882-8555-7afff3aa8ad0 --extensionDevelopmentPath=D:\\Projects\\dsct-oracle-to-ms-sql\\Product\\DsctOracleToMsSql| |Screen Reader|no| |VM|0%|
Benjin commented 1 year ago

I'm investigating what can be done at each layer to improve performance... All testing was done with the same project you provided (sql3.sqlproj, with ~7k scripts)

Microsoft.SqlServer.DacFx.Projects (debug build)

The performance seems to be okay for each scenario here:

Scenario Run 1 Run 2 Run 3 Run 4 Run 5
Open (props only; SDK-style) 186 ms 188 ms 187 ms 189 ms 187 ms
Open (full; SDK-style) 468 ms 406 ms 507 ms 391 ms 455 ms
Open (props only; Legacy-style) 243 ms 217 ms 239 ms 226 ms 215 ms
Open (full; Legacy-style) 372 ms 357 ms 396 ms 345 ms 360 ms

SQL Tools Service

Apparently, the parallel processing in SQL Tools Service is enabled for dev builds by default, but disabled for prod builds by default. This code can be updated to handle its own asynchrony regardless of the STS-wide setting instead of relying on the broader setting that may or may not be enabled.

Azure Data Studio (dev build, but retail builds of STS and SqlProjects underneath)

Each from a fresh launch of ADS, with at least 1 min between starts

Function Run 1 Run 2 Run 3 Run 4 Run 5
Load scripts from STS 7 ms 17 ms 12 ms 8 ms 8 ms
Construct file entry 66 ms 61 ms 51 ms 52 ms 48 ms
Detect CREATE TABLE 76,657 ms 7,676 ms 7,640 ms 76,807 ms 8,021 ms

So it seems we have an obvious culprit in the call to detect whether there's a CREATE TABLE statement in a given script. We should be able to farm that off to an asynchronous call, with the only impact being the "Open in Table Designer" context menu being missing until that's finished running in the background.

Looks like there's a warm path beyond what I controlled here (FS caching for the files via FILE_FLAG_NO_BUFFERING, speculative pathing by .NET?), but even the warm path (being 10x faster) is not amazing and probably warrants a fix.

SrivenNookala commented 1 year ago

Thanks for the impressive debugging @Benjin, It does seem like the create table detection is an issue.

Could you give me an insight into what exactly happens when we call the openProject method for adding the script files to the project in case of an SDK style project? Does the detect create table step happen each time the openProject method is called?

I haven't really tried non-SDK style projects but it could be interesting to see what happens there as well. I can try providing an equally large (but unlikely the exact same) non-SDK style project if you wish to investigate.

Note: I added all the relevant context from our conversation in the issue description.

Benjin commented 1 year ago

What exactly happens when we call openProject() for adding the script files to the project?

When openProject() gets called in ADS, it performs a series of calls to SQL Tools Service to load all the project contents, including project properties, SQL CMD variables, references to other databases, .sql scripts included, and folders.

Within readSqlObjectScripts(), it grabs the list of scripts from STS (~8 ms in my tests above), constructs the FileEntry object that the ADS extension uses to represent the file (~52 ms), and then shoots off another request per-script to STS to detect whether it has a CREATE TABLE statement using ScriptDom (~1 min, 16 sec cold).

Each time an operation occurs that might affect the list of scripts (e.g. excluding a folder, adding a new script) or the project tree is reloaded, the list of SQL object scripts is refreshed via this same call to readSqlObjectScripts() to keep the Tools Service model of the project as the sole source of truth. So to answer your question directly: yes, this occurs each time openProject() is called.

Legacy (non-SDK)-style projects

I converted sql3.sqlproj to a Legacy-style project for benchmarking the DacFx.Projects library, but didn't run the ADS tests against it because the difference in project type is abstracted away from ADS (via that library) and the project's contents should all be loaded in memory (in Tools Service, not ADS yet) at that point. There shouldn't be any difference in the investigation as a result.


I think the solution here is to spin up a task that does the CREATE TABLE calls asynchronously, and emit an event once once that's done loading in the background. On your end, it should finish loading the large project quickly (less than half a second) and this event gets fired off into the void. For when the .sqlproj is loaded in ADS by the projects system, it'll subscribe to this event and trigger the tree to refresh, now with the new property set.

Alternatively, if we find that crossing between ADS and STS 7k times is what's slowing down everything (rather than the script parsing), we have this CREATE TABLE check happen in STS and send that data back with the original payload of sql scripts. Based on my time in DacFx, I suspect it's the script parsing that's particularly slow, though.

SrivenNookala commented 1 year ago

Great! Thanks for the explanation. Your solution sounds good if it would speed the loading up to half a sec. Based on what I understood, this will also speed up the UI of the SQL projects extension and show the project files almost instantaneously correct?

That sounds like a win-win!

SrivenNookala commented 1 year ago

Hey @Benjin When would this performance fix be pushed to ADS stable? Is there any milestone or release assigned?