schemacrawler / SchemaCrawler

Free database schema discovery and comprehension tool
http://www.schemacrawler.com/
Other
1.6k stars 199 forks source link

Slowness for retrieving DB2 columns for a single table, and question on MetadataRetrievalStrategy.metadata_all #165

Closed shantstepanian closed 6 years ago

shantstepanian commented 6 years ago

Environment:

SchemaCrawler version 14.16.04-java7 on Java 7 Running on Linux against DB2 10.5 and JDBC driver version 3.4.65 - 9.1

Please explain the issue briefly

Crawl for DB2 table metadata for a single table takes a long time as we are querying for all columns no matter what.

It is going via the TableColumnRetriever.retrieveColumnsFromMetadataForAllTables as we specify metadata_all for the default DB2 value (see here, this looks to be the only implementation using metadata_all).

My query is using the table inclusion rule for tables, so using metadata should be fine. In my local testing, I can just set the value to MetadataRetrievalStrategy.metadata, and it works fine.

Any reason why metadata_all was specified for DB2 in this case? I can work around this locally and submit a fix later, but I'd like to ensure that I didn't miss anything obvious

sualeh commented 6 years ago

I tested against a large DB2 schema, and found that MetadataRetrievalStrategy.metadata_all was faster. However, I can see that MetadataRetrievalStrategy.metadata would be faster for subsets of tables. I was under the impression that this can be changed using the config file. I will have to see if it is possible to override using the config file or code. Let me write some test code for that.

shantstepanian commented 6 years ago

It would probably be best to make this overrideable via code, as the right value to use depends on whether you are searching for all tables or just a subset. We ran into this issue for Oracle as well (using the data_dictionary MetadataRetrievalStrategy was slower for a small subset of tables on a large database).

As of now, we are fine as we are overriding the value appropriately when doing a query for a single table vs all tables, though having it done via the API would be helpful.

sualeh commented 6 years ago

I have added a unit test to confirm that these values can be overridden in the config file, and also updated the default config files with examples.