Closed thomasjoscht closed 7 years ago
Yes, performance will be bad on large databases, We're loading a lot of different things under the covers - primary keys, foreign keys, indexes - which is a lot of work.
There's an incremental technique mentioned here: https://github.com/martinjw/dbschemareader/wiki/More-schema-reading#reading-incrementally The DatabaseReader.TableList() is just reading the table names, so that's really quick. In the document, I'm loading the columns, constraints and indexes table by table, which will be slow for you (it works really well in a UI when the user manually drills down). If you just use the internal ReaderAdapter to get columns, you won't have constraints- so no primary keys etc. So, yes, table names and columns are pretty quick, but when you want more information, the time starts to add up. I think I'd want at least pks and foreign keys for a minimal but quick read. What do you think?
Sorry, next post will be a little bit long but contains lot of info ;-)
I'v done a little deeper investigation on my test database on postrgesql. The following results could be a hint for time consumption but is not representative in all circumstances. My database consists of:
I created 4 profiling runs: ReadAll, ReadAll with owner, AllTables, AllTables with owner. Here are the results for ReadAll:
ReadAll -- All Users: 0s -- All Tables: 150s -- All Views: 0s -- All SPs: 0s -- All Seqs: 0s -- Update Refs: 1s
ReadAll with owner -- All Users: 0s -- All Tables: 53s -- All Views: 0s -- All SPs: 0s -- All Seqs: 0s -- Update Refs: 0s
We see that time consuming is nearly only All Tables. Therefore all other could be ignored. Differences with owner is also big, which is clearly.
Next step is to analyze AllTables which will lead me to the following results:
AllTables -- Get Tables: 0s -- Get Columns: 3s -- Get IdentityColumns: 0s -- Get CheckConstraints: 1s -- Get PrimaryKeys: 28s -- Get UniqueKeys: 0s -- Get ForeignKeys: 60s -- Get DefaultConstraints: 0s -- Get Triggers: 0s -- Get TableDescriptions: 0s -- Get ColumnDescriptions: 1s -- Get ComputedColumns: 0s -- MergeIndexColumns: 0s -- Update DataTable objects: 45s
AllTables with owner -- Get Tables: 0s -- Get Columns: 2,5s -- Get IdentityColumns: 0s -- Get CheckConstraints: 1s -- Get PrimaryKeys: 2s -- Get UniqueKeys: 0s -- Get ForeignKeys: 10s -- Get DefaultConstraints: 0s -- Get Triggers: 0s -- Get TableDescriptions: 0s -- Get ColumnDescriptions: 1s -- Get ComputedColumns: 0s -- MergeIndexColumns: 0s -- Update DataTable objects: 40s
In my case exists 2 parts which consumes nearly all time: Determine PK's+FK's and update all DataTable objects in foreach (in TableBuilder.Execute()).
In update DataTable objects the merge index processing which reads indexes per table with ReaderAdapter.Indexes(...) takes long time although no indexes exist.
In the end: I must eliminate PK's+FK's part and computing indexes because I don't need full conditioned informations.
My suggestion will contain 2 steps:
First a new method ReadSchema() (or overload of ReadAll) with ReadSchemaOptions parameter. In ReadSchemaOptions every computing part could be turned on and off like
Second a new overload for TableBuilder.Execute with ReadTableOptions parameter. In ReadTableOptions could also each part turned on and off like
An alternative to my suggestion could be to determine all things out of lib. But this will be problematic because a deep knowledge of processing must be exist and necessary information must be available during public methods. My main problems were:
So in the moment I've extended the lib to address my needs with a custom method
public IList<DatabaseTable> TablesAndColumnsOnly()
{
RaiseReadingProgress(SchemaObjectType.Tables);
IList<DatabaseTable> tables;
using (_readerAdapter.CreateConnection())
{
tables = _readerAdapter.Tables(null);
}
DatabaseSchema.Tables.Clear();
DatabaseSchema.Tables.AddRange(tables);
RaiseReadingProgress(SchemaObjectType.Columns);
IList<DatabaseColumn> columns;
using (_readerAdapter.CreateConnection())
{
columns = _readerAdapter.Columns(null);
}
foreach (var table in DatabaseSchema.Tables)
{
table.Columns.Clear();
table.Columns.AddRange(
columns.Where(x => string.Equals(x.TableName, table.Name, StringComparison.OrdinalIgnoreCase)
&& string.Equals(x.SchemaOwner, table.SchemaOwner, StringComparison.OrdinalIgnoreCase)));
}
DataTypes();
return DatabaseSchema.Tables;
}
This method takes only nearly 4 seconds without owner in my scenario described in comment before.
Another solution could be to make the ReaderAdapter of DatabaseReader public. In this case I could write an extension method which works like described in Solution 2. This could be theoretical the simplest modification of the lib. BUT: Problematic is ReaderAdapter class (and other parts) are internal and must be accessable also. Therefore this solution is not an option.
Solution 1, with loads of options, seems like overkill. Solution 2, TablesAndColumnsOnly, is fine for your scenario. Solution 3, exposing internals, would make a larger, more complicated API surface.
If you want to put solution 2 into a PR, I'll take it.
Ok. I've created a PR for this. New method named "TablesQuickView".
PS: Currently my Visual Studio 2017 Enterprise Version 15.3.4 is crashing opening project DatabaseSchemaReader.csproj. It seems there is a problem with multi framework targeting. I'm not alone with this problem: see here Change TargetFrameworks property to only one (no matter which one) solves the problem temporarily.
Hey Guys, I have a simple use case loading table information including table names, column names and data types of one database. One of my target databases is a PostgreSql with 900+ tables and 10k+ columns. I've tried:
Does a general peformance lack exist in large databases? Have you any other suggestion for retrieving my necessary information? I could create a pull request if no other solution exists.