Closed peterschaer closed 11 months ago
@peterschaer I will have a look. We changed behavior a little regarding server start up and other things. This might be related.
From my fist check I see: You are right we did not change this for long. We initialize availability on every request because than a administrator is able to switch of a municipality on the fly at runtime. We could load the availability on server start. We do it now for several things (Glossary, Disclaimers, etc.). But this would mean availability cant be changed without server restart. Otherwise I do not see any reason why this might be slower. Still investigating.... @michmuel @voisardf Do you experience similar performance decrease?
Just an update from my side: I upgraded our application to pyramid_oereb 2.0.0rc1 and Python 3.8 but unfortunately there was no improvement in performance. So the issue is still bothering me. Has anyone encountered a similar performance decrease with pyramid_oereb v2 compared to v1.x?
We run Oereb V2 in another environment (OpenShift) than Oereb V1. Therefore, we do not know exactly the difference in performance between the two versions that is related to pyramid_oereb. We are optimizing the OpenShift settings in terms of performance and stability in these days. At the moment Oereb V2 is certainly not running faster than V1.
@vvmruder and others: Is it possible that using a newer version of sqlalchemy had an effect on performance?
@michmuel it's not impossible that library upgrades have an influence, but I think it's much more likely that it's due to the V2 code changes. Profiling is needed to determine that.
@peterschaer I had a deeper look on this. But still can't find the real bottle neck. Would it be possible to get some date from you to have a real world use case?
@michmuel & @jwkaltz I think there is a issue in pyramid_oereb. Can't find it right now. I would suspect external libraries as last option too :)
@peterschaer I had a deeper look on this. But still can't find the real bottle neck. Would it be possible to get some date from you to have a real world use case?
Of course, in which form/way should I transmit it to you? You can get our application here (branch v3): https://github.com/AGIBE/oereb_server/tree/v3 or as a docker image here docker pull ghcr.io/agibe/oereb_server:3.0.0rc1
I did some profiling on my own. I follow the instructions in the SQLAlchemy Docs and profiled the availability query in https://github.com/openoereb/pyramid_oereb/blob/af0e124f1fa20d6cca5df8e505da2346f58c8df0/pyramid_oereb/contrib/data_sources/standard/sources/plr.py#L122
this query is executed 31 times (once for each topic). On my development machine it takes 0.063s. 0.063s x 31 = 1.953s. That means this initial step alone takes two seconds to execute. All the extract processing and querying is still to follow.
this is the profile output of one single availability query. I don't include the others, they all look the same:
availability_profile_2_0_0rc1.txt
at the end is the summary: 25713 function calls (24306 primitive calls) in 0.063 seconds
I did the same thing on the same machine with the old version of our application (based on pyramid_oereb 1.8.0). I profile the availability query in: https://github.com/openoereb/pyramid_oereb/blob/6666f02c7825a172761d872b8d1017d54087b40a/pyramid_oereb/standard/sources/plr.py#L74
this query is also executed 31 times (once for each topic). But here it takes only 0.005s. 0.005x x 31 = 0.155s which is acceptable.
This is the profile output of one single availability query.
availability_profile_1_8_0.txt
the summary: 851 function calls (826 primitive calls) in 0.005 seconds
=> much quicker, much less function calls
I still have no idea why this is like it is. But maybe somebody has an idea.
Great @peterschaer Thank you :) This would mean-there is a change in SQL-Alchemy which decreases performance indeed (as @michmuel suggested).
We could work around this as already discussed by quering the availabilities only on server start up.
But now I'am concerned how this SQL-Alchemy thing affects pyramid_oereb.
One possible lead may be the Caching mechanism in SQLAlchemy (introduced in v1.4). This cache does not cache the data but only the SQLAlchemy structures belonging to a query. Whether a statement was cached or not can be analysed in the logs (and in the debugtoolbar) when activating SQLAlchemy logging in
https://github.com/openoereb/pyramid_oereb/blob/af0e124f1fa20d6cca5df8e505da2346f58c8df0/pyramid_oereb/core/adapter.py#L32
by setting echo=True
for an extract in our application 130 SQL statements are issued to the database. There are:
[cached since ...]
) => that's ok. Those statements are using the cache. 13 of them are SELECT 1
statements from the health_check.[no key ...]
) => these are mostly statements using a PostGIS-function (especially ST_GeomFromWKB
). They probably don't implement the caching mechanism.[generated in ...]
) => those statements generate an entry in the cache. But this entry is never used. On each new request those statements are reentered in the cache. They never use the cache. The SQL statements for the availability fall into this category. That seems weird to as these statements are far from being complex.This is the full SQLAlchemy log for a single extract: sqlalchemy_cache_log.pdf Maybe this can help.
I also did a profiling of the SELECT 1
statements in the health check in
https://github.com/openoereb/pyramid_oereb/blob/af0e124f1fa20d6cca5df8e505da2346f58c8df0/pyramid_oereb/core/sources/__init__.py#L83
the first execution of this statement has a summary of 2827 function calls (2751 primitive calls) in 0.064 seconds
. This execution was surely uncached as it was the first after server startup. All subsequent executions of SELECT 1
have a summary of 246 function calls in 0.006 seconds
. Those executions will have used the cache. Using the cache is certainly beneficial or: not using the cache is bad for the performance.
@peterschaer could you maybe share some numbers here again to have a comparison what is the gain of improvement with the changes in #1540?
@peterschaer additional question / idea, You mentioned your production system is running pyramid_oereb 1.8.0, could you try to compare that a newer pyramid_oereb 1 version? Because the upgrade from sqlalchemy 1.3 to 1.4 happened just after pyramid_oereb 1.8.0. If you already experience the performance deterioration in the V1 system, then we can pretty sure it is really sqlalchemy (or the way pyramid_oereb uses it) the culprit, and not any code changes in V2.
@vvmruder I will share some performance numbers tomorrow (today I had to take care of other issues). And after that I will try the idea raised by @jwkaltz and measure the performance with pyramid_oereb 1.9.2.
@jwkaltz Would it be possible to make a new release (2.0.1 or so) after merging #1544 ? This way it would be much easier for me to make a new release on our infrastructure and to measure reliably the performance gains from #1544 and #1540
@peterschaer a release 2.0.1 is now available https://pypi.org/project/pyramid-oereb/2.0.1/ There is a problem with the documentation generation, so this part is not up-to-date (we have a dedicated ticket for that). But you can see the changes if you look directly in the file CHANGES.rst
In my original post I mentionned that creating a JSON extract for my reference parcel (EGRID CH173510804618) takes 3.5-4 seconds with pyramid_oereb 2.0.0b14 and 0.7-0.8 seconds with pyramid_oereb 1.8.0. After all the improvements (especially #1544 and #1540) the processing time is now down to about 1.8s with pyramid_oereb 2.0.1! This is certainly a big improvement.
There is still a certain performance gap between 1.8.0 and 2.0.1 and that leaves the question whether this issue should be closed or continued. Maybe we can discuss this next week on the occasion of our user group meeting?
@peterschaer could you try on your existing system with 1.8.1 or later, which includes the sqlachemy update? (See comment from April 6th for more details). If you already experience the performance degradation with 1.8.1, then we know the cause is the sqlalchemy update.
Yes, I was able to do this. I had to take version 1.9.2, because the update to SQLAlchemy 1.4. was made only there. Before that, 1.3. was always used. With this constellation I do not experience any performance degradation!! The processing time is the same as with 1.8.0.
In an earlier post I examined the SQLAlchemy Cache mechanism by looking at the SQLAlchemy Logs (https://github.com/openoereb/pyramid_oereb/issues/1508#issuecomment-1077588821). For 2.0.0 I got:
49 cached statements ([cached since ...])
13 uncached statemens ([no key ...])
68 generated statements ([generated in ...])
I did the same for 1.9.2 and got the following results:
155 cached statements ([cached since ...])
13 uncached statemens ([no key ...])
0 (!) generated statements ([generated in ...])
I redid the whole thing for 2.0.1 and got:
sqlalchemy_cache_log_192.pdf sqlalchemy_cache_log_201.pdf
So let's ignore the 13 ST_GeomFromWKB-Statements as they show the same behaviour in all versions. In 1.9.2 all other statements are using the cache. In 2.0.0 there are 68 statements that are not using the cache, in 2.0.1 this number has decreased to 38x thanks to #1540 and #1544). But there are still 38 uncached statements. My current guess is that if we can cause these statements to use the cache the performance would be about the same as in 1.9.2. But that's just a guess.
Hello,
I think there is a performance problem due to the default lazy loading of some objects. Too many SQL queries are made, when one unique big query with joins would be a lot more efficient.
I've prepared a pull request for this : https://github.com/openoereb/pyramid_oereb/pull/1574
Could you please review it, and merge it if everything looks good for you? Thanks.
@svamaa please ask in the poll
We have still performance issues in the new implementation (2.0). Previous version in extrating (xml/json) was faster in 1.8.x. (tessin)
From our point of view, the performance of smaller plots is somewhat slower than in v1.8, while larger plots are significantly faster than before. Therefore, the performance is acceptable for us. However, if someone still has an idea to improve the performance, we are happy to help (testing, measuring etc.).
Performance is ok for us (BL). However, good to hear if someone has an idea to increase the performance with little effort.
The performance of a GetEGRID-request may be improved (see #1680)
@svamaa Get latest situation update from user group and create a new ticket if needed
In our tests we are observing a noticeable performance difference between our current productive environment (based on pyramid_oereb 1.8.0) and the new version based on pyramid_oereb 2.0.0b14. One may observe the difference here:
pyramid_oereb 1.8: https://www.oereb-test.apps.be.ch/extract/reduced/json/CH173510804618 pyramid_oereb 2.0: https://www.oereb2-test.apps.be.ch/extract/json?egrid=CH173510804618 (the DNS entry is not public yet, you can use https://159.144.217.124/extract/json?egrid=CH173510804618 instead but you have to ignore the security warnings about the certificate)
Both instances are running in the very same kubernetes cluster in our data center. They are connecting to the same PostgreSQL 12 database (but different schemas because of the changes made to the transferstruktur data model). The data is the same (the old transferstruktur was transformed into the new one). With pyramid_oereb 1.8 the JSON extract is created in under one second (700-800ms), with pyramid_oereb 2.0 the JSON extract is created in 3.5-4 seconds!
I did some researching and profiling (with debug-toolbar und print-Statements). I have the impression that the difference has to do with the database queries. E.g. just creating the processor https://github.com/openoereb/pyramid_oereb/blob/e291d8a5adccf6e7aa61b58d60f66f4d1e655d5c/pyramid_oereb/core/processor.py#L321 takes about 2.5 seconds. The slow part ist querying the availabilities https://github.com/openoereb/pyramid_oereb/blob/e291d8a5adccf6e7aa61b58d60f66f4d1e655d5c/pyramid_oereb/contrib/data_sources/standard/sources/plr.py#L121 But this is a very simple query returning 338 rows in our case. I don't see anything to optimize here. This query takes about 0.07s. But it is done 31 times = 2.1s. But this part of the code hasn't changed from pyramid_oereb 1.8 and 2.0. The old version is also querying the availabilities of all 31 topics. But it does it much quicker. I suppose there's some overhead on each query and with more than 200 queries for an extract this adds up. But I can't narrow this down. May be there is some change is one of the used libraries or it has to do with the python version (pyramid_oereb 1.8 is running on Python 3.6, pyramid_oereb 2.0 is running on Python 3.7.
I think I can exclude the network as the source of error. Because when running both the application and the database as docker containers on my development environment I get the same results. In this setup there is no network between application and database.
I'm using the standard model for all 31 topics we have.
I cannot explain this difference therefore I'm posting this here. Has anybody during their tests observed something similar?