PostGIS Data Loads Extemely Slow - Githubissues

qgib commented 6 years ago

Author Name: Mickey Dietrich (Mickey Dietrich) Original Redmine Issue: 18295 Affected QGIS version: 3.4.5 Redmine category:data_provider/postgis

I have added data from PostGIS in Version 2.18 and the one data set loads in about 2 seconds. With the new 3.0 version, it takes 2 minutes to load that same data layer. All my data sets load really slow in Version 3.0. Not sure what the difference is from 2.18 to 3.0, but it has really caused problems with using the newer version, since we utilize the PostGIS database for accessing most all of our data. Is there something I am missing to make it load faster like in 2.18 or is there something behind the scenes causing this problem? Thanks, Mickey

qgib commented 6 years ago

Author Name: Keith Jenkins (@kgjenkins)

I was able to test this using a point layer of just 1124 points in Mickey's PostGIS database, and can confirm his observations (fast in 2.x, but very slow in 3.0.0). I've connected to other remote PostGIS databases without any slowness in 3.0.0.

Some other observations that might help diagnose the problem:

Using DB Manager to run a simple "select * from curb_stops" query directly on the PostGIS layer is very fast -- the result table appears in DB Manager almost instantly. But then when checking the box to "load as new layer", it renders very slowly.

Using DB Manager to run the same query via QGIS Virtual Layers is very slow, even to show the result table.

qgib commented 6 years ago

Author Name: Nyall Dawson (@nyalldawson)

Can you check the postgres logs and see if there's any difference in the queries sent from 2.18 to 3.0?

status_id was changed from Open to Feedback

qgib commented 6 years ago

Author Name: Mickey Dietrich (Mickey Dietrich)

Do you know where I can access the postgres logs showing queries on Amazon Web Services RDS? I only see error logs. Thanks, Mickey

*I believe I have figured it out, but have to wait for reboot before I can get the sql statements.

qgib commented 6 years ago

Author Name: Giovanni Manghi (@gioman)

Nyall Dawson wrote:

Can you check the postgres logs and see if there's any difference in the queries sent from 2.18 to 3.0?

I have not checked the logs yet, but in the meantime I confirm the issue, the difference is huge: tested with 1000 points layer we go from almost instantaneous to several tens of seconds.

operating_system was changed from Windows 7 to

qgib commented 6 years ago

Author Name: Nyall Dawson (@nyalldawson)

We need logs to do any meaningful diagnose here.

qgib commented 6 years ago

Author Name: Mickey Dietrich (Mickey Dietrich)

I have to wait for Amazon Web Services to reboot the server for where my PostGIS instance is installed. Looks like I had to change DB parameters so I can see query logs. I think the reboot is set for tomorrow: Here was link I found about the DB parameters: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_LogAccess.Concepts.PostgreSQL.html. Hopefully this works, otherwise I don't know how else to get the query logs. Every one of our staff are having the same issue and I have tried from home with the same results. QGIS 2.18 is fast and 3.0 is very slow. We were just about to make the switch for our staff over to QGIS from Esri, but need to figure out this issue first. Will keep my fingers crossed that the DB parameter thing works. Thanks for everyone's input so far, hopefully we will get things figured out.

qgib commented 6 years ago

Author Name: Giovanni Manghi (@gioman)

I cannot see this anymore, at least on master. Reopen of necessary.

status_id was changed from Feedback to Closed
resolution was changed from to fixed/implemented

qgib commented 5 years ago

Author Name: Regis Haubourg (@haubourg)

I've seen such issues today on 3.4.5. Opening a 2.18 project is fast. Opening it in QGIS3 takes up to 7 minutes (!).

We tested with --noplugins options, no success. Then we checked at the logs. One query was seen as isle, and the target table doesn't exists:

select count(*)

from information_schema.tables

where table_name = 'qgis_editor_widget_styles' ;

Running the same query runs in 1,7 s, which is slow but in fact the database is huge (170 schemas) and has a lot of tables. (maybe it's not the right query to use index efficiently) I failed to understand what is that "qgis_editor_widget_styles" table used for, even if I found the related PR an commits. I never encountered what use case triggers its creation. If this rings some bells to someone, I'd be happy to ear about it.

We didn't see anything clear on postgresql logs, still we need to investigate more seriously the log options.

On the client side, QGIS seems to mostly ... wait. So I suspect some timeout to occur in PG for each layer. To be followed.

status_id was changed from Closed to Open

qgib commented 5 years ago

Author Name: Giovanni Manghi (@gioman)

resolution was changed from fixed/implemented to
subject was changed from PostGIS Data Loads Extemely Slow in Version 3.0 to PostGIS Data Loads Extemely Slow
version was changed from 3.0.0 to 3.4.5

strk commented 5 years ago

Regis: do you mean qgis_editor_widget_styles contains records for many tables across 170 schemas ? Looking at the code dealing with that table I see many optimization opportunities, like avoiding querying it for every single table, but also avoiding the count(*)...

BTW, the code doesn't schema-qualify those tables (both qgis_editor_widget_styles and layer_styles) which I'm not sure is a good idea (and surely does NOT use an index, which is only defined on the tuple schema,table.

strk commented 5 years ago

I've just run a test creating the qgis_editor_widget_styles and populating it with 180224 records. The count(*) query for such table takes 1.012 ms for me, and it is followed by a query for each attribute of the table I add to the canvas. In my case I'm adding a PostGIS topology edge_data table, which has 9 fields, this results in 9 queries to the qgis_editor_widget_styles table, the slower of which takes 0.193 ms. This sums up to a worst case of under 3ms in total. QGIS logs are within the 2ms range:

2019-09-17 19:43:26.613 CEST [4361] strk@strk LOG:  duration: 1.012 ms  statement: SELECT COUNT(*) FROM information_schema.tables WHERE table_name=
'qgis_editor_widget_styles'
2019-09-17 19:43:26.614 CEST [4361] strk@strk LOG:  duration: 0.193 ms  statement: SELECT type, config FROM qgis_editor_widget_styles WHERE schema_name = 'linz01' and table_name = 'edge_data' and field_name = 'edge_id' LIMIT 1
2019-09-17 19:43:26.614 CEST [4361] strk@strk LOG:  duration: 0.078 ms  statement: SELECT type, config FROM qgis_editor_widget_styles WHERE schema_name = 'linz01' and table_name = 'edge_data' and field_name = 'start_node' LIMIT 1
2019-09-17 19:43:26.614 CEST [4361] strk@strk LOG:  duration: 0.075 ms  statement: SELECT type, config FROM qgis_editor_widget_styles WHERE schema_name = 'linz01' and table_name = 'edge_data' and field_name = 'end_node' LIMIT 1
2019-09-17 19:43:26.614 CEST [4361] strk@strk LOG:  duration: 0.072 ms  statement: SELECT type, config FROM qgis_editor_widget_styles WHERE schema_name = 'linz01' and table_name = 'edge_data' and field_name = 'next_left_edge' LIMIT 1
2019-09-17 19:43:26.614 CEST [4361] strk@strk LOG:  duration: 0.071 ms  statement: SELECT type, config FROM qgis_editor_widget_styles WHERE schema_name = 'linz01' and table_name = 'edge_data' and field_name = 'abs_next_left_edge' LIMIT 1
2019-09-17 19:43:26.614 CEST [4361] strk@strk LOG:  duration: 0.068 ms  statement: SELECT type, config FROM qgis_editor_widget_styles WHERE schema_name = 'linz01' and table_name = 'edge_data' and field_name = 'next_right_edge' LIMIT 1
2019-09-17 19:43:26.615 CEST [4361] strk@strk LOG:  duration: 0.065 ms  statement: SELECT type, config FROM qgis_editor_widget_styles WHERE schema_name = 'linz01' and table_name = 'edge_data' and field_name = 'abs_next_right_edge' LIMIT 1
2019-09-17 19:43:26.615 CEST [4361] strk@strk LOG:  duration: 0.065 ms  statement: SELECT type, config FROM qgis_editor_widget_styles WHERE schema_name = 'linz01' and table_name = 'edge_data' and field_name = 'left_face' LIMIT 1
2019-09-17 19:43:26.615 CEST [4361] strk@strk LOG:  duration: 0.103 ms  statement: SELECT type, config FROM qgis_editor_widget_styles WHERE schema_name = 'linz01' and table_name = 'edge_data' and field_name = 'right_face' LIMIT 1
2019-09-17 19:43:26.615 CEST [4361] strk@strk LOG:  duration: 0.165 ms  statement: SELECT count(*) FROM pg_inherits WHERE inhparent='"linz01"."edge_data"'::regclass

Now, this is with the qgis_editor_widget_styles table created as found in a code comment (having a primary key defined on schema/table/field).

Dropping the index on the qgis_editor_widget_styles changes the time of each (per-field) query to 45ms in worst case, summing up to 450ms in total (for 10 fields). This is still under 1/2 second, so far from the reported timing.

Any chance to get the definition of that qgis_editor_widget_styles table ? As the times I see so far are not justifying an optimization.

strk commented 5 years ago

I've added more records to the qgis_editor_widget_styles table, summing up to 360448 rows, and made sure none of the loaded table fields are found (to ensure select queries has worst case). The table still loads in under 1 second:

2019-09-17 19:51:17.381 CEST [5252] strk@strk LOG:  duration: 0.906 ms  statement: SELECT COUNT(*) FROM information_schema.tables WHERE table_name=
'qgis_editor_widget_styles'
2019-09-17 19:51:17.539 CEST [5252] strk@strk LOG:  duration: 158.370 ms  statement: SELECT type, config FROM qgis_editor_widget_styles WHERE schem
a_name = 'linz01' and table_name = 'edge_data' and field_name = 'edge_id' LIMIT 1
2019-09-17 19:51:17.589 CEST [5252] strk@strk LOG:  duration: 50.269 ms  statement: SELECT type, config FROM qgis_editor_widget_styles WHERE schema
_name = 'linz01' and table_name = 'edge_data' and field_name = 'start_node' LIMIT 1
2019-09-17 19:51:17.638 CEST [5252] strk@strk LOG:  duration: 48.586 ms  statement: SELECT type, config FROM qgis_editor_widget_styles WHERE schema
_name = 'linz01' and table_name = 'edge_data' and field_name = 'end_node' LIMIT 1
2019-09-17 19:51:17.687 CEST [5252] strk@strk LOG:  duration: 48.443 ms  statement: SELECT type, config FROM qgis_editor_widget_styles WHERE schema
_name = 'linz01' and table_name = 'edge_data' and field_name = 'next_left_edge' LIMIT 1
2019-09-17 19:51:17.735 CEST [5252] strk@strk LOG:  duration: 48.056 ms  statement: SELECT type, config FROM qgis_editor_widget_styles WHERE schema
_name = 'linz01' and table_name = 'edge_data' and field_name = 'abs_next_left_edge' LIMIT 1
2019-09-17 19:51:17.783 CEST [5252] strk@strk LOG:  duration: 48.041 ms  statement: SELECT type, config FROM qgis_editor_widget_styles WHERE schema
_name = 'linz01' and table_name = 'edge_data' and field_name = 'next_right_edge' LIMIT 1
2019-09-17 19:51:17.831 CEST [5252] strk@strk LOG:  duration: 47.837 ms  statement: SELECT type, config FROM qgis_editor_widget_styles WHERE schema
_name = 'linz01' and table_name = 'edge_data' and field_name = 'abs_next_right_edge' LIMIT 1
2019-09-17 19:51:17.879 CEST [5252] strk@strk LOG:  duration: 47.642 ms  statement: SELECT type, config FROM qgis_editor_widget_styles WHERE schema
_name = 'linz01' and table_name = 'edge_data' and field_name = 'left_face' LIMIT 1
2019-09-17 19:51:17.927 CEST [5252] strk@strk LOG:  duration: 48.457 ms  statement: SELECT type, config FROM qgis_editor_widget_styles WHERE schema
_name = 'linz01' and table_name = 'edge_data' and field_name = 'right_face' LIMIT 1

strk commented 5 years ago

A couple of small optimizations I submitted as part of #31858 and I can think of further optimizing the per-field query to take a single query (in my case we go from 9 x 50ms to 1 x 80ms) but I'm still not sure it's worth the trouble if there's no clear sign of things being slow

strk commented 5 years ago

I've implemented the optimization reducing queries to that table as well.

strk commented 5 years ago

@kgjenkins , @gioman, @haubourg can any of you try to reproduce with the code in #31858 ?

gioman commented 5 years ago

can any of you try to reproduce with the code in #31858 ?

@strk I cannot replicate the problem, is there a reliable way to do that?

strk commented 5 years ago

@gioman I'm not sure how to reproduce, @haubourg reported he was able. I guess it takes a layer with very many attributes and a database with very many tables...

ghost commented 3 years ago

This is still apparent on the latest release, Zurich. As a comparison there are no issues accessing the same database using UMN mapserver and OpenLayers, or DBeaver. I have added extra indexes on the most usual predicates, such as the spatial type column.

In addition if there is a database connection defined in the project where the database is not available, recognising this takes a very long time, typically this can be determined in seconds not minutes.

I suspect that the Qgis handling of database may be in the same thread as the main program, if this is not the case something else is at fault.

It is a great shame this has not been resolved over the past through years.

gioman commented 3 years ago

This is still apparent on the latest release, Zurich. As a comparison there are no issues accessing the same database using UMN mapserver and OpenLayers, or DBeaver. I have added extra indexes on the most usual predicates, such as the spatial type column.

In addition if there is a database connection defined in the project where the database is not available, recognising this takes a very long time, typically this can be determined in seconds not minutes.

I suspect that the Qgis handling of database may be in the same thread as the main program, if this is not the case something else is at fault.

@aimdev there have been some patches recently that should have been improved the situation, can you test master?

t is a great shame

Are you or your organization a contributor (code, funds, etc.)?

ghost commented 2 years ago

This is still apparent on the latest release, Zurich. As a comparison there are no issues accessing the same database using UMN mapserver and OpenLayers, or DBeaver. I have added extra indexes on the most usual predicates, such as the spatial type column. In addition if there is a database connection defined in the project where the database is not available, recognising this takes a very long time, typically this can be determined in seconds not minutes. I suspect that the Qgis handling of database may be in the same thread as the main program, if this is not the case something else is at fault.

@aimdev there have been some patches recently that should have been improved the situation, can you test master?

t is a great shame

Are you or your organization a contributor (code, funds, etc.)?

No, I am a data analyst trying to find a tool to design new maps. The latest issue running on macosx with an SDD, is painfully slow, and only by using a smaller dataset can I see anything within a reasonable time, approx 3 minutes. I standby my previous statement where other products so not have this latency issue, on the same hardware and database. For example displaying using wms the world's and then drilling down to street level can be achieved with 10 seconds.

haubourg commented 2 years ago

@aimdev are you sure this is the same issue? This one hase been resolved. QGIS loads data multithreaded, but for data formats or views that do not have metadata (ie estimated extent, Primary keys, etc..), QGIS will query a lot more informations than Mapserver, because it is designed to allow data editing.
There is a "trust datasource" option in the project properties to avoid QGIS doing too much of these queries and react more like Mapserver.
Please test and open a dedicated issue if this doesn't solve your issue.

And yes blaming / shaming doesn't help here.

ghost commented 2 years ago

I will try out this option. Is there a way to obtain the required metadata from postgresql or is it a manual operation.

Blaming & Shaming, I don't think so, constructive comments yes.

qgis / QGIS

PostGIS Data Loads Extemely Slow #26186