qgis / QGIS

QGIS is a free, open source, cross platform (lin/win/mac) geographical information system (GIS)
https://qgis.org
GNU General Public License v2.0
10.58k stars 3k forks source link

Add PostGIS layers is too slow on connecting to a database #31160

Closed FERRATON closed 3 years ago

FERRATON commented 5 years ago

QGIS 3.4 I create a new ticket because I don't know how to reopen a ticket with github... see https://github.com/qgis/QGIS/issues/19639 It seems to me that QGIS always scans all tables when connecting (see attachment) which can be very slow. in my case 20 minutes... it is impossible to stop the scan in the background, even if you close the dialog box or with the stop button. (I have no geometry of the GEOMETRY type). it is a remote server...(117 schemas of each a hundred tables).

2019-08-08_09h15_12

gioman commented 5 years ago

Do you have the "use estimated table metadata" option enabled in your connection parameters?

FERRATON commented 5 years ago

Yes, I have activated the options: Do not solve the type for unrestricted geometries (GEOMETRY) and Use the estimated metadata table. I think the solution mentioned in #19639 would be a nice solution: Tables and views should only be read if the schema is selected, but only the name of the table or view. When the table or view is selected, the additional information must be read.

FERRATON commented 5 years ago

Maybe a possibility to connect directly in DBManager without scanning the tables would solve the pb.

elpaso commented 5 years ago

@FERRATON, if I understand your issue correctly, I think that we cannot skip the scan for GEOMETRY tables (@jef-n correct me if I'm wrong), because they can contain different geometry types and/or SRIDs and we need to know them in order to extract the information about the tables.

You are in principle right that it could be done only when actually necessary, but it would require a deep refactoring of the provider code and it would also bring some delay when that information is actually required, worsening the user experience. In other words, I believe it's better to have an initial delay when loading a DB than a delay in the middle of the user interaction with the browser tree (when expanding the schema, that might be open anyway if the browser stores the open/close state of the tree).

I would recommend to avoid GEOMETRY tables and use separate tables for different geometry types.

FERRATON commented 5 years ago

I had understood not to use the GEOMETRY type. What do you think about the possibility of introducing a connection directly into Dbmanager that would have a different approach and that would privilege an approach that optimizes the initial connection time. This would give the user a choice.

github-actions[bot] commented 3 years ago

The QGIS project highly values your report and would love to see it addressed. However, this issue has been left in feedback mode for the last 14 days and is being automatically marked as "stale". If you would like to continue with this issue, please provide any missing information or answer any open questions. If you could resolve the issue yourself meanwhile, please leave a note for future readers with the same problem and close the issue. In case you should have any uncertainty, please leave a comment and we will be happy to help you proceed with this issue. If there is no further activity on this issue, it will be closed in a week.

github-actions[bot] commented 3 years ago

While we hate to see this happen, this Issue has been automatically closed because it has not had any activity in the last 42 days despite being marked as feedback. If this issue should be reconsidered, please follow the guidelines in the previous comment and reopen this issue. Or, if you have any further questions, there are also further support channels that can help you.

thk686 commented 1 year ago

I stumbled on this trying to figure out why trying to connect to my postgis database is suddenly taking a very long time. QGIS is the only application I use where I regularly have to use force-quit/kill-9 and so on. I guess you can draw you own conclusions about that. The real answer here is anything that takes longer than 30 seconds or so should time out and give the user an option. Why are you not using async transactions? Very frustrating.

larsop commented 7 months ago

Sometimes listing tables takes an almost an hour on this system.

The sql sent to the database is this (I did format it here to make it more easy to read I used the original query on depesz links below.

SELECT l.f_table_name,l.f_table_schema,l.f_geometry_column,upper(l.type),l.srid,l.coord_dimension,c.relkind,obj_description(c.oid),array_agg(a.attname 
ORDER BY a.attnum), count(CASE WHEN t.typname IN ('geometry','geography','raster') THEN 1 ELSE NULL END) , 1  
FROM geometry_columns l,pg_class c,pg_namespace n,pg_attribute a,pg_type t 
WHERE c.relname=l.f_table_name AND 
l.f_table_schema=n.nspname AND 
NOT a.attisdropped AND 
a.attrelid=c.oid AND 
a.atttypid=t.oid AND 
a.attnum>0 AND 
n.oid=c.relnamespace AND 
has_schema_privilege(n.nspname,'usage') AND 
has_table_privilege(c.oid,'select') GROUP BY 1,2,3,4,5,6,7,c.oid,11 
UNION 
SELECT l.f_table_name,l.f_table_schema,l.f_geography_column,upper(l.type),l.srid,2,c.relkind,obj_description(c.oid),array_agg(a.attname 
ORDER BY a.attnum), count(CASE WHEN t.typname IN ('geometry','geography','raster') THEN 1 ELSE NULL END) , 2  
FROM geography_columns l,pg_class c,pg_namespace n,pg_attribute a,pg_type t 
WHERE c.relname=l.f_table_name AND 
l.f_table_schema=n.nspname AND 
NOT a.attisdropped AND 
a.attrelid=c.oid AND 
a.atttypid=t.oid AND 
a.attnum>0 AND 
n.oid=c.relnamespace AND 
has_schema_privilege(n.nspname,'usage') 
AND has_table_privilege(c.oid,'select') 
GROUP BY 1,2,3,4,5,6,7,c.oid,11
UNION
SELECT l."r_table_name",l."r_table_schema",l."r_raster_column",'RASTER',l.srid,2,c.relkind,obj_description(c.oid),array_agg(a.attname 
ORDER BY a.attnum), count(CASE WHEN t.typname IN ('geometry','geography','raster') THEN 1 ELSE NULL END) , 5
FROM raster_columns l,pg_class c,pg_namespace n,pg_attribute a,pg_type t
WHERE c.relname=l."r_table_name" AND
l."r_table_schema"=n.nspname AND
NOT a.attisdropped
AND a.attrelid=c.oid
AND a.atttypid=t.oid
AND a.attnum>0
AND n.oid=c.relnamespace
AND has_schema_privilege(n.nspname,'usage')
AND has_table_privilege(c.oid,'select')
GROUP BY 1,2,3,4,5,6,7,c.oid,11
ORDER BY 2,1,3

This database have an very many table so on other smaller databases this is not a problem. Then it runs in a couple of seconds.


PostgreSQL 12.6 (Ubuntu 12.6-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
(1 row)

POSTGIS="3.1.1 aaf4c79" [EXTENSION] PGSQL="120" GEOS="3.12.1-CAPI-1.18.1" SFCGAL="1.3.7" PROJ="7.2.1" GDAL="GDAL 3.2.1, released 2020/12/29" LIBXML="2.9.10" LIBJSON="0.13.1" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)" TOPOLOGY RASTER
(1 row)

I have test with up 100 MB workmem and it's still slow.

Test with 20 MB workmem trace here

Test with 50 MB workmem trace here

Test with 100 MB workmem trace here

I also did a vacuum analyze this table list (some of them can not be vacuumed )

 vacuum analyze pg_aggregate;
 vacuum analyze pg_am;
 vacuum analyze pg_amop;
 vacuum analyze pg_amproc;
 vacuum analyze pg_attrdef;
 vacuum analyze pg_attribute;
 vacuum analyze pg_auth_members;
 vacuum analyze pg_authid;
 vacuum analyze pg_available_extension_versions;
 vacuum analyze pg_available_extensions;
 vacuum analyze pg_cast;
 vacuum analyze pg_class;
 vacuum analyze pg_collation;
 vacuum analyze pg_config;
 vacuum analyze pg_constraint;
 vacuum analyze pg_conversion;
 vacuum analyze pg_cursors;
 vacuum analyze pg_database;
 vacuum analyze pg_db_role_setting;
 vacuum analyze pg_default_acl;
 vacuum analyze pg_depend;
 vacuum analyze pg_description;
 vacuum analyze pg_enum;
 vacuum analyze pg_event_trigger;
 vacuum analyze pg_extension;
 vacuum analyze pg_file_settings;
 vacuum analyze pg_foreign_data_wrapper;
 vacuum analyze pg_foreign_server;
 vacuum analyze pg_foreign_table;
 vacuum analyze pg_group;
 vacuum analyze pg_hba_file_rules;
 vacuum analyze pg_index;
 vacuum analyze pg_indexes;
 vacuum analyze pg_inherits;
 vacuum analyze pg_init_privs;
 vacuum analyze pg_language;
 vacuum analyze pg_largeobject;
 vacuum analyze pg_largeobject_metadata;
 vacuum analyze pg_locks;
 vacuum analyze pg_matviews;
 vacuum analyze pg_namespace;
 vacuum analyze pg_opclass;
 vacuum analyze pg_operator;
 vacuum analyze pg_opfamily;
 vacuum analyze pg_partitioned_table;
 vacuum analyze pg_pltemplate;
 vacuum analyze pg_policies;
 vacuum analyze pg_policy;
 vacuum analyze pg_prepared_statements;
 vacuum analyze pg_prepared_xacts;
 vacuum analyze pg_proc;
 vacuum analyze pg_publication;
 vacuum analyze pg_publication_rel;
 vacuum analyze pg_publication_tables;
 vacuum analyze pg_range;
 vacuum analyze pg_replication_origin;
 vacuum analyze pg_replication_origin_status;
 vacuum analyze pg_replication_slots;
 vacuum analyze pg_rewrite;
 vacuum analyze pg_roles;
 vacuum analyze pg_rules;
 vacuum analyze pg_seclabel;
 vacuum analyze pg_seclabels;
 vacuum analyze pg_sequence;
 vacuum analyze pg_sequences;
 vacuum analyze pg_settings;
 vacuum analyze pg_shadow;
 vacuum analyze pg_shdepend;
 vacuum analyze pg_shdescription;
 vacuum analyze pg_shseclabel;
 vacuum analyze pg_stat_activity;
 vacuum analyze pg_stat_all_indexes;
 vacuum analyze pg_stat_all_tables;
 vacuum analyze pg_stat_archiver;
 vacuum analyze pg_stat_bgwriter;
 vacuum analyze pg_stat_database;
 vacuum analyze pg_stat_database_conflicts;
 vacuum analyze pg_stat_gssapi;
 vacuum analyze pg_stat_progress_cluster;
 vacuum analyze pg_stat_progress_create_index;
 vacuum analyze pg_stat_progress_vacuum;
 vacuum analyze pg_stat_replication;
 vacuum analyze pg_stat_ssl;
 vacuum analyze pg_stat_subscription;
 vacuum analyze pg_stat_sys_indexes;
 vacuum analyze pg_stat_sys_tables;
 vacuum analyze pg_stat_user_functions;
 vacuum analyze pg_stat_user_indexes;
 vacuum analyze pg_stat_user_tables;
 vacuum analyze pg_stat_wal_receiver;
 vacuum analyze pg_stat_xact_all_tables;
 vacuum analyze pg_stat_xact_sys_tables;
 vacuum analyze pg_stat_xact_user_functions;
 vacuum analyze pg_stat_xact_user_tables;
 vacuum analyze pg_statio_all_indexes;
 vacuum analyze pg_statio_all_sequences;
 vacuum analyze pg_statio_all_tables;
 vacuum analyze pg_statio_sys_indexes;
 vacuum analyze pg_statio_sys_sequences;
 vacuum analyze pg_statio_sys_tables;
 vacuum analyze pg_statio_user_indexes;
 vacuum analyze pg_statio_user_sequences;
 vacuum analyze pg_statio_user_tables;
 vacuum analyze pg_statistic;
 vacuum analyze pg_statistic_ext;
 vacuum analyze pg_statistic_ext_data;
 vacuum analyze pg_stats;
 vacuum analyze pg_stats_ext;
 vacuum analyze pg_subscription;
 vacuum analyze pg_subscription_rel;
 vacuum analyze pg_tables;
 vacuum analyze pg_tablespace;
 vacuum analyze pg_timezone_abbrevs;
 vacuum analyze pg_timezone_names;
 vacuum analyze pg_transform;
 vacuum analyze pg_trigger;
 vacuum analyze pg_ts_config;
 vacuum analyze pg_ts_config_map;
 vacuum analyze pg_ts_dict;
 vacuum analyze pg_ts_parser;
 vacuum analyze pg_ts_template;
 vacuum analyze pg_type;
 vacuum analyze pg_user;
 vacuum analyze pg_user_mapping;
 vacuum analyze pg_user_mappings;
 vacuum analyze pg_views;

After 15 minutes I just broke this query running with 100 MB workmem.

EXPLAIN ANALYZE SELECT l.f_table_name,l.f_table_schema,l.f_geometry_column,upper(l.type),l.srid,l.coord_dimension,c.relkind,obj_description(c.oid),array_agg(a.attname ORDER BY a.attnum), count(CASE WHEN t.typname IN ('geometry','geography','raster') THEN 1 ELSE NULL END) , 1  FROM geometry_columns l,pg_class c,pg_namespace n,pg_attribute a,pg_type t WHERE c.relname=l.f_table_name AND l.f_table_schema=n.nspname AND NOT a.attisdropped AND a.attrelid=c.oid AND a.atttypid=t.oid AND a.attnum>0 AND n.oid=c.relnamespace AND has_schema_privilege(n.nspname,'usage') AND has_table_privilege(c.oid,'select') GROUP BY 1,2,3,4,5,6,7,c.oid,11 UNION SELECT l.f_table_name,l.f_table_schema,l.f_geography_column,upper(l.type),l.srid,2,c.relkind,obj_description(c.oid),array_agg(a.attname ORDER BY a.attnum), count(CASE WHEN t.typname IN ('geometry','geography','raster') THEN 1 ELSE NULL END) , 2  FROM geography_columns l,pg_class c,pg_namespace n,pg_attribute a,pg_type t WHERE c.relname=l.f_table_name AND l.f_table_schema=n.nspname AND NOT a.attisdropped AND a.attrelid=c.oid AND a.atttypid=t.oid AND a.attnum>0 AND n.oid=c.relnamespace AND has_schema_privilege(n.nspname,'usage') AND has_table_privilege(c.oid,'select') GROUP BY 1,2,3,4,5,6,7,c.oid,11 UNION SELECT l."r_table_name",l."r_table_schema",l."r_raster_column",'RASTER',l.srid,2,c.relkind,obj_description(c.oid),array_agg(a.attname ORDER BY a.attnum), count(CASE WHEN t.typname IN ('geometry','geography','raster') THEN 1 ELSE NULL END) , 5  FROM raster_columns l,pg_class c,pg_namespace n,pg_attribute a,pg_type t WHERE c.relname=l."r_table_name" AND l."r_table_schema"=n.nspname AND NOT a.attisdropped AND a.attrelid=c.oid AND a.atttypid=t.oid AND a.attnum>0 AND n.oid=c.relnamespace AND has_schema_privilege(n.nspname,'usage') AND has_table_privilege(c.oid,'select') GROUP BY 1,2,3,4,5,6,7,c.oid,11 ORDER BY 2,1,3
;

I may test vacuum full later to night to check , that may work better, I think it has before.