schemaspy / schemaspy

Database documentation built easy
http://schemaspy.org
GNU Lesser General Public License v3.0
3.16k stars 310 forks source link

SnowflakeDB support - catalog and schema only matching on schema #483

Open brianrepko opened 5 years ago

brianrepko commented 5 years ago

I'm working on SnowflakeDB support for SchemaSpy.

SnowflakeDB supports catalogs and schemas. While you can set these on the JDBC connection properties that does not limit the JDBC DatabaseMetaData until you've set the session with a USE command. My situation is that I have two databases - A and B - both with the same schema - PUBLIC. When I run SchemaSpy with -cat A and -s PUBLIC, then it searches for both A.PUBLIC and B.PUBLIC. I believe that there are some places in the code that are not using catalog.

I'm hoping to avoid creating all the SQL statements in the dbtype properties file but I know that that is my work-around.

npetzall commented 5 years ago

287 it seems possible to set db directly when connecting.

So for a snowflake db a similar setup as mysql should be possible.

I can go over the code, but just make sure your not extanding something that has custom queries.

Are you using 6.0.0 or 6.1.0-SNAPSHOT?

brianrepko commented 5 years ago

I'm using 6.0.0 - and just posted our setup in #482

brianrepko commented 5 years ago

was thinking that I might try extends mysql just to see

npetzall commented 5 years ago

Give the snapshot a go just in case. Some changes has been done since 6.0.0

I hope to have a look at the code in an hour or two.

npetzall commented 5 years ago

To get a more targeted scope could you give an example of things that show up that belongs to the other catalog?

npetzall commented 5 years ago

https://github.com/schemaspy/schemaspy/blob/master/src/main/java/org/schemaspy/service/DatabaseService.java#L492

Found this one place, gonna see how many tests fail when I change it.

npetzall commented 5 years ago

If it totally fail you can override two queries to avoid that block of code:

selectTablesSql and selectViewsSql https://schemaspy.readthedocs.io/en/latest/configuration/databaseType.html#sql-query-instead-of-databasemetadata

selectTablesSql= Fetch tables, expected columns: table_name, table_catalog, table_schema, table_comment, table_rows selectViewsSql= Fetch views, expected columns: view_name, view_catalog, view_schema, view_comment, view_definition

npetzall commented 5 years ago

I changed, ran tests and only got 1 set of unit tests that failed. I checked the test and I think the issue is handled by startup code that is skipped in the test.

There is a small issue, some jdbc doesn't allow '%' wildcard and SchemaSpy 6.x doesn't allow null as catalog.

From DatabaseMetaData

@param catalog a catalog name; must match the catalog name as it

  • is stored in the database; "" retrieves those without a catalog;
  • null means that the catalog name should not be used to narrow
  • the search

This could be changed in 7.x so for now I think we should go for just overriding code with the SQL queries, will only be two.

selectTablesSql=select table_name, table_catalog, table_schema, comment as table_comment, ROW_COUNT as table_rows from information_schema.tables where table_catalog = :catalog and table_schema = :schema
selectViewsSql=select TABLE_NAME as view_name, TABLE_CATALOG as view_catalog, TABLE_SCHEMA as view_schema, COMMENT as view_comment, VIEW_DEFINITION as view_definition from information_schema.views WHERE table_schema != 'INFORMATION_SCHEMA' and table_catalog = :catalog and table_schema = :schema

Might correct some case-ing since I haven't checked looked it up or have the ability to test.

brianrepko commented 5 years ago

An example of where this fails is when something exists in B.PUBLIC but not in A.PUBLIC.

DEBUG - Failed to fetch number of rows for 'B.PUBLIC.B_ONLY_DATA' using built-in query with 'count(*)'
net.snowflake.client.jdbc.SnowflakeSQLException: SQL compilation error:
Object 'A.PUBLIC.B_ONLY_DATA' does not exist.
brianrepko commented 5 years ago

And yes - DatabaseService was the same place that I was looking at as a potential bug.

npetzall commented 5 years ago

Seems to be that the databasemeta data found a table in the other database, so when trying to populate that table it failes. So the overrides should prevent the table in the other database to be found

brianrepko commented 5 years ago

testing that next... and yes that is exactly what is happening - without the USE commands. If I could add the USE DATABASE and USE SCHEMA commands then the DatabaseMetaData returns the correct limited data

brianrepko commented 5 years ago

The following snowflake.properties file now works for me.

#
# see http://schemaspy.sourceforge.net/dbtypes.html
# for configuration / customization details
#

description=Snowflake
connectionSpec=jdbc:snowflake://<account>.snowflakecomputing.com/?role=<role>&warehouse=<warehouse>&db=<database>&schema=<schema>&passcodeInPassword=on
driver=net.snowflake.client.jdbc.SnowflakeDriver

# descriptions for above info
account=account or account and region
role=role
warehouse=warehouse
database=database
schema=schema

dbThreads=1

schemaSpec=(?!^INFORMATION_SCHEMA$).*

selectTablesSql=select table_name as "table_name", table_catalog as "table_catalog", table_schema as "table_schema", comment as "table_comment", row_count as "table_rows" from information_schema.tables where table_catalog = :catalog and table_schema = :schema

selectViewsSql=select table_name as "view_name", table_catalog as "view_catalog", table_schema as "view_schema", comment as "view_comment", view_definition as "view_definition" from information_schema.views where table_catalog = :catalog and table_schema = :schema
brianrepko commented 5 years ago

for doing multi-schema, one can remove the schema portion of the connection properties - really only need the database/catalog

npetzall commented 5 years ago

The multi-schema has always been an issue, I have some ideas on how to fix it in 7.x but we're not at 7.x yet.

Minor changes that would avoid duplicated command-line args.

remove database and replace with db db=<db>

I think you can replace schema with s schema=<s>

Simple test if that works and you can create a PR with the snowflake and get some glory.

brianrepko commented 5 years ago

I can replace the db=<db> but the schema=<s> gets messed up in the multi-schema setting. In the end, it is better to treat those as connection properties (they technically are all optional) and the SchemaSpy options <db> / <s> are for execution.

I'll submit a PR for snowflake support soon