Closed pyrocks closed 5 years ago
And what if you run the command on a local copy of that database please ?
The problem is I do not have a local postgres installation - all of my DBs are on RDS.
I was brave to run the same command but with loglevel=all.
It first runs a query to get the available types:
SELECT t.typname,t.oid FROM pg_catalog.pg_type t JOIN pg_catalog.pg_namespace n ON (t.typnamespace = n.oid) WHERE n.nspname != 'pg_toast'
Then it runs the following query in a loop and bind a different type everytime:
SELECT typinput='array_in'::regproc, typtype FROM pg_catalog.pg_type LEFT JOIN (select ns.oid as nspoid, ns.nspname, r.r from pg_namespace as ns join ( select s.r, (current_schemas(false))[s.r] as nspname from generate_series(1, array_upper(current_schemas(false), 1)) as s(r) ) as r using ( nspname ) ) as sp ON sp.nspoid = typnamespace WHERE typname = $1 ORDER BY sp.r, pg_type.oid DESC LIMIT 1
FE=> Bind(stmt=null,portal=null,$1=<'regproc'>,type=VARCHAR)
...
FE=> Bind(stmt=null,portal=null,$1=<'tid'>,type=VARCHAR)
...
This looks inefficient. I am certain the info can be found in a single fetch and then parsed by code. I will take a look how to fetch it all in one go.
For now, what you could do is grabbing an offline schemacrawler export, then parse it locally, just to make it possible for you to report efficiently locally.
I'm not entirely sure what you're aiming for. My use case is RDS. I found a query running in a loop, which, if could be run once would improve the performance substantially for such high latency envs. I do not see a point in running it locally, it would obviously be faster, not necessarily more efficient.
@pyrocks - SchemaCrawler relies on the PostgreSQL JDBC driver to provide type information. You are right in that the PostgreSQL JDBC driver is inefficient in how it obtains type information. You can find the queries it runs in PgDatabaseMetaData.java. It seems that the driver does some post-processing after it retrieves data from the database. You are better off submitting a patch for the driver.
@pyrocks - the inefficient loop takes place in PgDatabaseMetaData.java#L1138-L1139. That statement in turn calls TypeInfoCache.java#L204-L216 which is very inefficient.
@sualeh nice hearing from you again :) Thanks for the assistance, I had no idea the problem is in the jdbc driver - I will monitor the issue.
Glad to hear from you, @pyrocks Yes, please follow the issue I opened with the PostgreSQL team - https://github.com/pgjdbc/pgjdbc/issues/1342
Wouldn't it be possible to implement a workaround in the postgres-specific submodule of schemacrawler. I.e. manually querying those tables?
@conker84 - yes, that would have been the ideal way to go. I looked at the PostgreSQL JDBC driver, and it looks like PostgreSQL does not provide clean set of data dictionary or information schema tables for this purpose.
Issue
Hi, it's me again :) Now on a new front - i'm trying to use schemacrawler with postgres on RDS. For now - everything works fine, except that "Retrieving system column data types" takes more than a minute for a DB with just a single table (i'm still experimenting).
I don't know what to expect really - but it seems like this shouldn't take so long (a few seconds max is what I would expect). I'm not as experienced with postgres as I am with Oracle so it might take me a while to analyze what's going on in the DB.
Following is the command I used:
./schemacrawler.sh -server=postgresql -host=*** -port=*** -database=*** -schemas=*** -user=*** -infolevel=standard -command=schema -outputformat=pdf -outputfile=./entities.pdf -loglevel=info -noinfo
and Here's the log: ... Nov 18, 2018 9:48:24 AM schemacrawler.crawl.SchemaCrawler crawlSchemas INFO: Retrieved 1 schemas Nov 18, 2018 9:48:24 AM schemacrawler.crawl.SchemaCrawler crawlColumnDataTypes INFO: Crawling column data types Nov 18, 2018 9:48:24 AM schemacrawler.crawl.SchemaCrawler lambda$crawlColumnDataTypes$0 INFO: Retrieving system column data types Nov 18, 2018 9:49:37 AM schemacrawler.crawl.SchemaCrawler lambda$crawlColumnDataTypes$1 INFO: Not retrieving user column data types, since this was not requested Nov 18, 2018 9:49:37 AM schemacrawler.crawl.SchemaCrawler crawlColumnDataTypes INFO: Total time taken for - 00:01:13.014 hours
-100.0% - 00:01:13.013 -
-0.0% - 00:00:00.001 -
...
Let me know what additional info I can provide.
Environment
SchemaCrawler 15.01.06 on Mac 10.13.6 DB is postgres 10.4 on Amazon RDS with the embedded driver
Thanks, Mor