martinjw / dbschemareader

Read database metadata (from SqlServer/Oracle/MySql/SQLite/PostgreSql/DB2 etc) into one simple model
Microsoft Public License
293 stars 128 forks source link

Performance lack retrieving table information of large database #47

Closed thomasjoscht closed 7 years ago

thomasjoscht commented 7 years ago

Hey Guys, I have a simple use case loading table information including table names, column names and data types of one database. One of my target databases is a PostgreSql with 900+ tables and 10k+ columns. I've tried:

Does a general peformance lack exist in large databases? Have you any other suggestion for retrieving my necessary information? I could create a pull request if no other solution exists.

martinjw commented 7 years ago

Yes, performance will be bad on large databases, We're loading a lot of different things under the covers - primary keys, foreign keys, indexes - which is a lot of work.

There's an incremental technique mentioned here: https://github.com/martinjw/dbschemareader/wiki/More-schema-reading#reading-incrementally The DatabaseReader.TableList() is just reading the table names, so that's really quick. In the document, I'm loading the columns, constraints and indexes table by table, which will be slow for you (it works really well in a UI when the user manually drills down). If you just use the internal ReaderAdapter to get columns, you won't have constraints- so no primary keys etc. So, yes, table names and columns are pretty quick, but when you want more information, the time starts to add up. I think I'd want at least pks and foreign keys for a minimal but quick read. What do you think?

thomasjoscht commented 7 years ago

Sorry, next post will be a little bit long but contains lot of info ;-)

I'v done a little deeper investigation on my test database on postrgesql. The following results could be a hint for time consumption but is not representative in all circumstances. My database consists of:

I created 4 profiling runs: ReadAll, ReadAll with owner, AllTables, AllTables with owner. Here are the results for ReadAll:

We see that time consuming is nearly only All Tables. Therefore all other could be ignored. Differences with owner is also big, which is clearly.

Next step is to analyze AllTables which will lead me to the following results:

In my case exists 2 parts which consumes nearly all time: Determine PK's+FK's and update all DataTable objects in foreach (in TableBuilder.Execute()).

In update DataTable objects the merge index processing which reads indexes per table with ReaderAdapter.Indexes(...) takes long time although no indexes exist.

In the end: I must eliminate PK's+FK's part and computing indexes because I don't need full conditioned informations.

thomasjoscht commented 7 years ago

Solution 1 - Options

My suggestion will contain 2 steps:

  1. First a new method ReadSchema() (or overload of ReadAll) with ReadSchemaOptions parameter. In ReadSchemaOptions every computing part could be turned on and off like

    • AllUsers=false
    • AllTables=true
    • AllViews=false
    • AllStoredProcedures=false
    • AllSequences=false
    • ReadTableOptions = new ReadTableOptions();
  2. Second a new overload for TableBuilder.Execute with ReadTableOptions parameter. In ReadTableOptions could also each part turned on and off like

    • [ReadTables=true]
    • Columns=true
    • IdentityColumns=false
    • CheckConstraints=false
    • PrimaryKeys=false
    • UniqueKeys=false
    • ForeignKeys=false
    • DefaultConstraints=false
    • Indexes=false
    • Triggers=false
    • TableDescriptions=false
    • ColumnDescriptions=false

Solution 2 - New methods

An alternative to my suggestion could be to determine all things out of lib. But this will be problematic because a deep knowledge of processing must be exist and necessary information must be available during public methods. My main problems were:

So in the moment I've extended the lib to address my needs with a custom method

public IList<DatabaseTable> TablesAndColumnsOnly()
{
    RaiseReadingProgress(SchemaObjectType.Tables);
    IList<DatabaseTable> tables;
    using (_readerAdapter.CreateConnection())
    {
        tables = _readerAdapter.Tables(null);
    }
    DatabaseSchema.Tables.Clear();
    DatabaseSchema.Tables.AddRange(tables);

    RaiseReadingProgress(SchemaObjectType.Columns);
    IList<DatabaseColumn> columns;
    using (_readerAdapter.CreateConnection())
    {
        columns = _readerAdapter.Columns(null);
    }

    foreach (var table in DatabaseSchema.Tables)
    {
        table.Columns.Clear();
        table.Columns.AddRange(
            columns.Where(x => string.Equals(x.TableName, table.Name, StringComparison.OrdinalIgnoreCase)
                                && string.Equals(x.SchemaOwner, table.SchemaOwner, StringComparison.OrdinalIgnoreCase)));
    }

    DataTypes();

    return DatabaseSchema.Tables;
}

This method takes only nearly 4 seconds without owner in my scenario described in comment before.

Solution 3 - Public ReaderAdapter

Another solution could be to make the ReaderAdapter of DatabaseReader public. In this case I could write an extension method which works like described in Solution 2. This could be theoretical the simplest modification of the lib. BUT: Problematic is ReaderAdapter class (and other parts) are internal and must be accessable also. Therefore this solution is not an option.

martinjw commented 7 years ago

Solution 1, with loads of options, seems like overkill. Solution 2, TablesAndColumnsOnly, is fine for your scenario. Solution 3, exposing internals, would make a larger, more complicated API surface.

If you want to put solution 2 into a PR, I'll take it.

thomasjoscht commented 7 years ago

Ok. I've created a PR for this. New method named "TablesQuickView".

PS: Currently my Visual Studio 2017 Enterprise Version 15.3.4 is crashing opening project DatabaseSchemaReader.csproj. It seems there is a problem with multi framework targeting. I'm not alone with this problem: see here Change TargetFrameworks property to only one (no matter which one) solves the problem temporarily.