martinjw / dbschemareader

Read database metadata (from SqlServer/Oracle/MySql/SQLite/PostgreSql/DB2 etc) into one simple model
Microsoft Public License
293 stars 128 forks source link

Reading the same schema from PostgreSQL takes a lot longer than from MSSQL #146

Open Arnagos opened 2 years ago

Arnagos commented 2 years ago

I was running my test suite and noticed that the PostgreSQL tests take A LOT longer than the MSSQL tests. To verify this I wrote a short benchmark with BenchmarkDotNet and got these results:

MSSQL

Method Mean Error StdDev Median Gen0 Gen1 Allocated
ReadAll 765.5 ms 18.91 ms 54.85 ms 750.3 ms 2000.0000 1000.0000 14.47 MB
AllTables 456.9 ms 12.77 ms 37.45 ms 457.6 ms 2000.0000 1000.0000 13.98 MB

PostgreSQL

Method Mean Error StdDev Gen0 Gen1 Allocated
ReadAll 2.835 s 0.0536 s 0.0638 s 1000.0000 - 7.03 MB
AllTables 2.653 s 0.0485 s 0.0454 s - - 4.48 MB

As you can see both ReadAll and AllTables take a lot longer for PostgreSQL (3.7x and 5.8x respectively). Knowing the performance of PostgreSQL this result is really surprising to me and seems rather strange.

Based on my profiling the culprits seem to be the PrimaryKeys (1.2s) and ForeignKeys (1s) methods of the PostgreSQLAdapter.

Arnagos commented 2 years ago

I took a quick look at the SQL used to load the constraints and think a faster solution to the problem would be to load all the constraints simultaneously instead of executing the same (very expensive) SQL multiple times. Every run of the constraints SQL takes 70-100ms, which results in the above numbers if applied to every single loaded table and constraint type. Loading all constraints at once on the other hand only takes ~200ms and results in a few hundred results that can then easily be filtered in C#.

martinjw commented 2 years ago

ReadAll internally calls AllTables, which internally calls TableBuilder.

TableBuilder does seeral calls on constraints- for the different types (primary key, foreign key, unique, check, null). For all, it does all tables - so it shouldn't be doing anything table by table.

For PostgreSql, 3 are the constraints use the same query (primary, foreign, unique), so yes, that is repeated, but maximum 3 times irregardless of number of tables. It may be possible to optimize this in the postgreSqlAdaptor class down to one, which is cached and reused between the (consecutive) calls- although you wouldn't want that cache to remain in memory.

Arnagos commented 2 years ago

Do you have an idea, why PostgreSQL takes more than thrice the time of MSSQL? Ordinarily, one'd expect the opposite ...

martinjw commented 2 years ago

I guess the metadata tables are not optimized - for example, no indexes on schema or table name, so every query is a full table scan. Also sometimes the metadata tables contain vast amounts of system data- Oracle is a bad offender here, it is by far the slowest to read.