Closed siepkes closed 10 months ago
We (@GeertZondervan and I) did some additional troubleshooting and thought it might be helpful to share our findings here.
Overall I'm guessing by the use of LockOptions.SKIP_LOCKED
the intention is to create a simple queue implementation in PostgreSQL (something similar as described here)?
The first thing I noticed is that the queue implementation will always retrieve all jobs. When implementing a simple queue with PostgreSQL's SKIP LOCKED
usually you have a worker retrieve (and lock) a single job (i.e. row). Using SKIP LOCKED
in your SQL query ensures that a worker will always get the next available job which isn't being worked on. However the current implementation retrieves all jobs and puts them in a local queue in memory and then starts working on that.
The main reason why a high volume of jobs crashed the system (by running out of heap space) was because Hibernate was retrieving far too much fields (i.e. not just a job ID and job status) from the database and then tried to stuff everything into a resultset. This is the point where the application goes out of heap space.
We thought we could easily fix this by having Hibernate detach only the things it needed. While this did fix the issue with going out of heap space it brought to light another issue. The application now locks-up after boot. I think this is because the queue execution is manually triggered here on application startup:
This blocks the thread in Spring boots which fires events (I don't think this thread is suitable for doing such long running work). This also has another side-effect: The job scheduler doesn't know about it. Meaning in 60 seconds the @Scheduled(fixedRate = 60000)
of BatchJobChunkExecutionQueue.kt
will kick in even if the manual invocation is still running. Using something like @Scheduled(fixedRate = 60000, initialDelay = 0)
might be more appropriate.
Hey! Thanks a lot for the investigation!
So if I understand it correctly, the issue is in the populateQueue() method, right? So what about if we didn't return an entity, but some kind of view containing only the fields we need? That might save some memory...
@Scheduled(fixedRate = 60000, initialDelay = 0)
Yep!
So if I understand it correctly, the issue is in the populateQueue() method, right?
Yes.
So what about if we didn't return an entity, but some kind of view containing only the fields we need? That might save some memory...
Yes, that would probably prevent Tolgee from running out of heap space. The issue is that currently Hibernate apparently joins in the table with the JSONB parameter field for every entry in the tolgee_batch_job_chunk_execution
table (which in our case had 5500+ entries). Which leads to the large number of PostgreSQL JDBC tuple instances of about 1MB a piece in our JVM heap.
Combined with not blocking the Spring Boot event thread by using@scheduled(fixedRate = 60000, initialDelay = 0)
and not calling batchJobChunkExecutionQueue.populateQueue()
I think Tolgee should run normally even with a large queue.
Wouldn't it be better to revise the task population in a way the Tolgee server would both avoid retrieving unnecessary data (easy doable with a new map
in the Hibernate query or a special view, and never fetch all the tasks but only parts of it?
The server would for instance pick N jobs that are pending, mark them as "taken" (so other instances in a distributed context would not pick them too), and run these N jobs. Whenever the job queue reaches M jobs with M < N, the server would try to fetch more jobs to execute and start running them (M can be non-zero so the server can start fetching jobs before it runs out and ensure there's never a job exhaustion, so it keeps crunching through jobs).
Making sure a the "taken" status does automatically "expire" after a certain time (e.g. could not be executed because the node crashed or something), so it can be picked back up later on again. Unless there's something I'm missing, this should let the jobs be nicely distributed without requiring explicit node communication and/or awareness of their existence (except when the amount of jobs is < N, which should be rare and can be worked around by having servers who queue the job immediately mark them as taken if their job queue is not full, relying on the load distribution to act as a distribution layer). This would also make the memory footprint of the job queue mostly inexistent for a virtually infinitely large queue.
The server would for instance pick N jobs that are pending, mark them as "taken" (so other instances in a distributed context would not pick them too), and run these N jobs. Whenever the job queue reaches M jobs with M < N, the server would try to fetch more jobs to execute and start running them (M can be non-zero so the server can start fetching jobs before it runs out and ensure there's never a job exhaustion, so it keeps crunching through jobs).
Sounds like a solid idea. It's what a lot of (Postgre)SQL job queue implementations do and what the SKIP LOCKED
SQL instruction is mostly used for.
This article: How to implement a database job queue using SKIP LOCKED offers an example how to implement such a thing with Java and Hibernate using SKIP LOCKED
.
When implementing the feature, i was inspired by the skip locked approach, but I also combined it with fast state cache in redis and pub-sub in redis. So it's not very probable instances would try to work on the same chunks, since when something is removed from local queue, it's propagated to other instances via pub/sub.
That said, I think fetching only the required data and removing the manual trigger on app start should be enough. Right?
That said, I think fetching only the required data and removing the manual trigger on app start should be enough. Right?
Yes, I think so!
Are there some volunteers, who would like to fix it? ๐
We already have a prototype fix so @GeertZondervan will create a PR!
@JanCizmar We created a prototype which prevents the loading of the entire object and ensures the job is only started from the scheduler (and not manually). While this fixes the initial problems it uncovers more problems.
For example, while the initial job retrieval (from the DB to local cache) is now fast (takes milliseconds to complete) when a second retrieval is started it takes about 6 minutes to complete. This is being caused by Hibernate somehow, not by the DB. At this point the entire application freezes (the app works normal before this second retrieval). So it seems something else is also kicking-off this code and then suddenly it's long running work and blocking some main thread (some Kotlin co-routine executor thread perhaps?).
but I also combined it with fast state cache in redis and pub-sub in redis.
Could you explain what the benefit of the local cache is? It complicates the queue implementation a lot. The way I see it no query is actually prevented because when running multiple Tolgee nodes each node will still need to check if no other node is actually executing the task. Meaning that it needs to run a query at the start of the task? So if that's the case then why not only simply just SKIP LOCKED
?
What makes the issue(s) hard to troubleshoot for us is that there is almost no documentation (not on it's conceptual design, nor on the implementation of the design). Without documentation or the on-hand knowledge of the queue implementation this rabbit hole seems a bit to deep for us.
Hey!
Thanks for the feedback. I'll try to provide some docs. To fix this fast, can you provide some the prototype with some examaple data so I can reproduce it fast and possibly fix it?
The benefit of the implementation is to save the database resources. With pure skip locked it would either unbalanced (so single node would handle a lot of chunks while other nodes would be sleeping) or it would execute too many queries to database. While syncing the queues using Redis pub/sub and using Redis as a storage (on multiple node setup) seems like optimal compromise. in the same time I understand it's hard to read and complicated.
Hello @JanCizmar
The fix I've been working on can be found here
We have created a project with multiple languages and the following language settings;
Note the Auto Translate Imported Items, which causes the creation of the jobs.
Now when we import one of our XLIFF files with about 5000 trans-units. it will create about 5000 rows in the tolgee_batch_job_chunk_execution
table and a row in the tolgee_batch_job
table with an enormous value in the target
(jsonb) column.
I suspect this enormous target column is the reason for the original Heap space issue. Which is solved by the changes I made in my branch. Unfortunately we ran into the problems described by @siepkes in his previous post.
OK! Thanks a lot! I'll try to create some sample large xliff and will provide a fix soon!
Created this PR including your changes. I've generated large XLIFF and not it works correctly for me, can you please check? I'll try to go a bit further and optimize the import & batch operations a bit more.
Hey! I've started to optimize the import process even more, which apparently led to a lot of issues. However, at this commit it's stable. So if you would like to test it with your data, use this one:
https://github.com/tolgee/tolgee-platform/pull/2025/commits/bb5a04773cba1da4bddfaea0fa45ea505ac2e5b2
So I enjoyed pretty nice weekend Hibernate adventure. I went from trying to optimize Hibernate operations by keepeing the context smal, moving to StatelessSession and ignoring the context at all, but what I found the fastest was finally executing the queries via the sql connection prepareStatement
and executeBatch
directly.
I will continue by providing some reliable loading feedback in the UI and optimizing the UI so it's not so stressful for users to import data.
Hey! I've started to optimize the import process even more, which apparently led to a lot of issues. However, at this commit it's stable. So if you would like to test it with your data, use this one:
Hello!
I tried testing your PR, using the 9a1017d as the latest commit.
Unfortunately this does not seem to fix the issue for us. Just like with my original PR, the Heap Size problem is fixed, but then it gets stuck on hibernate doing a lot of work. The actual query in the populateQueue()
method returns fast, but then hibernate does some loading of entity/values which is very slow, and blocks the entire application.
2023-12-18T14:51:38.295+01:00 DEBUG 774581 --- [MessageBroker-2] i.t.batch.BatchJobChunkExecutionQueue : Running scheduled populate queue
2023-12-18T14:51:39.580+01:00 DEBUG 774581 --- [MessageBroker-2] tor$SharedEntityManagerInvocationHandler : Creating new EntityManager for shared EntityManager invocation
2023-12-18T14:51:49.853+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.orm.sql.exec : Skipping reading Query result cache data: cache-enabled = false, cache-mode = NORMAL
2023-12-18T14:51:51.135+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.orm.results : Initializer list
2023-12-18T14:51:52.404+01:00 DEBUG 774581 --- [alina-utility-1] o.apache.catalina.session.ManagerBase : Start expire sessions StandardManager at 1702907511135 sessioncount 0
2023-12-18T14:51:53.620+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.SQL : select bjce1_0.id,bjce1_0.batch_job_id,bjce1_0.execute_after,bj1_0.job_character from tolgee_batch_job_chunk_execution bjce1_0 join tolgee_batch_job bj1_0 on bj1_0.id=bjce1_0.batch_job_id where bjce1_0.status=? order by bjce1_0.created_at asc nulls first,bjce1_0.execute_after asc nulls first,bjce1_0.id asc nulls first
2023-12-18T14:51:53.620+01:00 DEBUG 774581 --- [alina-utility-1] o.apache.catalina.session.ManagerBase : End expire sessions StandardManager processingTime 1270 expired sessions: 0
2023-12-18T14:51:55.633+01:00 DEBUG 774581 --- [l-1 housekeeper] com.zaxxer.hikari.pool.HikariPool : HikariPool-1 - Pool stats (total=10, active=3, idle=7, waiting=0)
2023-12-18T14:51:55.634+01:00 DEBUG 774581 --- [l-1 housekeeper] com.zaxxer.hikari.pool.HikariPool : HikariPool-1 - Fill pool skipped, pool has sufficient level or currently being filled (queueDepth=0).
2023-12-18T14:52:01.507+01:00 DEBUG 774581 --- [nio-80-Acceptor] o.apache.tomcat.util.threads.LimitLatch : Counting up[http-nio-80-Acceptor] latch=1
2023-12-18T14:52:02.778+01:00 DEBUG 774581 --- [nio-80-Acceptor] o.apache.tomcat.util.threads.LimitLatch : Counting up[http-nio-80-Acceptor] latch=2
2023-12-18T14:52:07.225+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.orm.results.loading : Calling top-level assembler (0 / 1) : org.hibernate.sql.results.graph.instantiation.internal.DynamicInstantiationAssemblerConstructorImpl@7b0071c1
2023-12-18T14:52:07.225+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.orm.results : Extracted JDBC value [0] - [1000419001]
2023-12-18T14:52:07.226+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.orm.results : Extracted JDBC value [1] - [1000418001]
2023-12-18T14:52:07.226+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.orm.results : Extracted JDBC value [2] - [null]
2023-12-18T14:52:07.226+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.orm.results : Extracted JDBC value [3] - [SLOW]
2023-12-18T14:52:07.226+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.orm.results.loading : Calling top-level assembler (0 / 1) : org.hibernate.sql.results.graph.instantiation.internal.DynamicInstantiationAssemblerConstructorImpl@7b0071c1
2023-12-18T14:52:07.226+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.orm.results : Extracted JDBC value [0] - [1000419002]
2023-12-18T14:52:07.226+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.orm.results : Extracted JDBC value [1] - [1000418001]
2023-12-18T14:52:07.226+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.orm.results : Extracted JDBC value [2] - [null]
2023-12-18T14:52:07.226+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.orm.results : Extracted JDBC value [3] - [SLOW]
...
...
...
2023-12-18T14:54:48.508+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.orm.results.loading : Calling top-level assembler (0 / 1) : org.hibernate.sql.results.graph.instantiation.internal.DynamicInstantiationAssemblerConstructorImpl@7b0071c1
2023-12-18T14:54:48.508+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.orm.results : Extracted JDBC value [0] - [1000419327]
2023-12-18T14:54:48.508+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.orm.results : Extracted JDBC value [1] - [1000418001]
2023-12-18T14:54:48.508+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.orm.results : Extracted JDBC value [2] - [null]
2023-12-18T14:54:48.508+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.orm.results : Extracted JDBC value [3] - [SLOW]
2023-12-18T14:54:48.508+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.orm.results.loading : Calling top-level assembler (0 / 1) : org.hibernate.sql.results.graph.instantiation.internal.DynamicInstantiationAssemblerConstructorImpl@7b0071c1
2023-12-18T14:54:48.508+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.orm.results : Extracted JDBC value [0] - [1000419328]
2023-12-18T14:54:50.514+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.orm.results : Extracted JDBC value [1] - [1000418001]
2023-12-18T14:54:51.777+01:00 DEBUG 774581 --- [MessageBroker-2] org.hibernate.orm.results : Extracted JDBC value [2] - [null]
...
...
For some reason the initial scheduled run does not cause any problems, the queue is filled quickly, but the next calls to populateQueue()
are very slow.
So I enjoyed pretty nice weekend Hibernate adventure. I went from trying to optimize Hibernate operations by keepeing the context smal, moving to StatelessSession and ignoring the context at all, but what I found the fastest was finally executing the queries via the sql connection
prepareStatement
andexecuteBatch
directly.
- Activity is not stored using the batch statements and it doesn't go through hibernate at all
- The same happens for new batch job creation
- I optimized the way auto translation job is created, so it doesn't create targets for base language and not configured languages
- Generally, the import is much faster and lightweight now
- I was testing it using the profier included in IntelliJ IDEA and I haven't encountered any suspicious memory allocation behaviour
- I was testing it appended this generated large XLIFF file translations.xliff.zip
I will continue by providing some reliable loading feedback in the UI and optimizing the UI so it's not so stressful for users to import data.
Sounds great! Looking forward to it.
Does it also happen with my xliff? Can you also share your language settings? Can you also share your configuration?
EDIT: Sorry, lot of questions, but I really need to know ๐
I don't think there is much going on in our settings. This is our applications.properties
:
logging.level.root=INFO
server.port=80
tolgee.postgres-autostart.enabled=false
spring.datasource.url=jdbc:postgresql://postgresql-13-misc-1.inst.sp-prod.nl1.cns.serviceplanet.nl,postgresql-13-misc-2.inst.sp-prod.nl1.cns.serviceplanet.nl,postgresql-13-misc-3.inst.sp-prod.nl1.cns.serviceplanet.nl/postgres?targetServerType=primary&hostRecheckSeconds=5
spring.datasource.username=[REDACTED]
spring.datasource.password=[REDACTED]
tolgee.front-end-url=https://tolgee.serviceplanet.nl
tolgee.content-delivery.public-url-prefix=https://tolgee.serviceplanet.nl
tolgee.cache.enabled=true
tolgee.cache.caffeine-max-size=50000
tolgee.cache.use-redis=false
tolgee.authentication.enabled=true
# Do not allow self-registration.
tolgee.authentication.native-enabled=false
tolgee.authentication.registrations-allowed=false
tolgee.authentication.secured-image-retrieval=true
tolgee.authentication.initial-username=[REDACTED]
tolgee.authentication.initial-password=[REDACTED]
tolgee.authentication.jwt-secret=[REDACTED]
tolgee.user-can-create-organizations=false
tolgee.machine-translation.free-credits-amount=-1
tolgee.machine-translation.google.api-key=[REDACTED]
tolgee.machine-translation.google.default-primary=true
tolgee.machine-translation.google.default-enabled=true
# Google login
tolgee.authentication.google.client-id=[REDACTED]
tolgee.authentication.google.client-secret=[REDACTED]
tolgee.authentication.google.workspace-domain=serviceplanet.nl
tolgee.telemetry.enabled=false
tolgee.smtp.auth=false
tolgee.smtp.from=Tolgee <tolgee@serviceplanet.nl>
tolgee.smtp.host=[REDACTED]
tolgee.smtp.port=25
tolgee.smtp.ssl-enabled=false
#tolgee.smtp.username=user@company.com
#tolgee.smtp.password="omg/my/password"
Notice that we don't use Redis.
This is how we start Tolgee:
/opt/local/java/openjdk17/bin/java \
-Dnetworkaddress.cache.ttl=5 \
-Dnetworkaddress.cache.negative.ttl=5 \
-XX:ActiveProcessorCount=3 \
-XX:+UseShenandoahGC \
-Xms2G \
-Xmx2G \
-jar /opt/tolgee/tolgee.jar \
--spring.config.additional-location=/etc/tolgee/application.properties
Both your example XLIFF and our larger XLIFF still trigger the issue that the web interface becomes unresponsive and Tolgee takes about 7 minutes to load the the data in to Hibernate. We tested with Tolgee 6005b8c4671673864369f440de3a0f953d13c5be.
Thanks for the information. Anyway, I cannot reproduce it locally, so I really need some minimal steps to reproduce it locally on a fresh Tolgee run.
Alright let me get my ducks in a row first, I'll get back to you!
@JanCizmar Quick question; Did you test with real machine translation and multiple languages (9+) in a project? Or are you testing with the mock machine translation?
I am testing it with mock. Base and 2 languages.
Allright here we go, bear with me here because there are quite some steps ;-)
Assuming you already have a running PostgreSQL instance. This test was done with PostgreSQL 13. Create a clean test database :
DROP SCHEMA public CASCADE;
CREATE SCHEMA public;
GRANT ALL ON SCHEMA public TO postgres;
GRANT ALL ON SCHEMA public TO public;
REVOKE ALL PRIVILEGES ON DATABASE postgres FROM tolgee;
DROP ROLE tolgee;
CREATE ROLE tolgee WITH LOGIN ENCRYPTED PASSWORD 'ZaKxwAXKNxd7CAYF7AoNhVr0WusIGUZuPw40';
GRANT ALL PRIVILEGES ON DATABASE postgres TO tolgee;
CREATE EXTENSION IF NOT EXISTS pg_trgm SCHEMA public;
CREATE EXTENSION IF NOT EXISTS unaccent WITH SCHEMA public;
Next we will build and run Tolgee. We will make a small modification to the Google Machine translation provider. The provider no longer actually calls the Google API, it just sleeps for a second and returns the original string. This makes it a more realistic test then the mock one, since responses over network aren't instant:
$ git clone https://github.com/tolgee/tolgee-platform.git tolgee-queue-test
$ cd tolgee-queue-test
$ git checkout v3.41.3
$ cat > ./mock-google.patch << EOF
---
.../providers/GoogleTranslationProvider.kt | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/backend/data/src/main/kotlin/io/tolgee/component/machineTranslation/providers/GoogleTranslationProvider.kt b/backend/data/src/main/kotlin/io/tolgee/component/machineTranslation/providers/GoogleTranslationProvider.kt
index 27be746c..f2df77c8 100644
--- a/backend/data/src/main/kotlin/io/tolgee/component/machineTranslation/providers/GoogleTranslationProvider.kt
+++ b/backend/data/src/main/kotlin/io/tolgee/component/machineTranslation/providers/GoogleTranslationProvider.kt
@@ -17,12 +17,8 @@ class GoogleTranslationProvider(
get() = !googleMachineTranslationProperties.apiKey.isNullOrEmpty()
override fun translateViaProvider(params: ProviderTranslateParams): MtValueProvider.MtResult {
- val result = translateService.translate(
- params.text,
- Translate.TranslateOption.sourceLanguage(params.sourceLanguageTag),
- Translate.TranslateOption.targetLanguage(params.targetLanguageTag),
- Translate.TranslateOption.format("text")
- ).translatedText
+ Thread.sleep(1000)
+ val result = params.text + " - Google (not) translated..."
return MtValueProvider.MtResult(
result,
params.text.length * 100
--
EOF
$ patch -p1 < ./mock-google.patch
$ cat > application.yaml <<EOF
spring:
jpa:
show-sql: true
datasource.url: "jdbc:postgresql://127.0.0.1:5432/postgres"
datasource.username: "tolgee"
datasource.password: "ZaKxwAXKNxd7CAYF7AoNhVr0WusIGUZuPw40"
tolgee:
postgres-autostart:
enabled: false
billing:
enabled: false
authentication:
enabled: true
needs-email-verification: true
registrations-allowed: true
initial-password: "admin"
cache:
enabled: true
use-redis: false
default-ttl: 120000
caffeine-max-size: 1000000
machine-translation:
free-credits-amount: -1
google:
api-key: mock
telemetry:
enabled: false
server:
port: 8080
error:
include-exception: true
include-stacktrace: always
EOF
$ rm -Rf build
$ ./gradlew packResources
$ java -Xmx2G -jar ./build/libs/tolgee.jar --spring.config.additional-location=./application.yaml
Tolgee now runs. Navigate to http://127.0.0.1:8080
and login with admin
/ admin
.
Create a project called "Sandbox" and add the following languages:
ca-ES
de-DE
en-US
es-ES
eu-ES
fr-FR
gl-ES
nl-NL
pt-PT
Remove English (en)
, set the "Base Language" to Dutch (Netherlands)
and click "Save".
Goto "Languages". Open the settings for Machine translations and ensure the following options are enabled:
Enable machine translation with primary provider
Enable translation memory
Auto translate imported items
Goto "Import" and select the large XLIFF test file (translations.xliff.zip
) for import:
Dutch (Netherlands)
.Import the file.
Wait a bit (couple of minutes).
The application will become unresponsive. For example logging in doesn't work anymore.
Interestingly enough with the above reproducer instructions in the end Tolgee 3.41.3 also runs out of heap space (even when running with -Xmx2G
):
Hibernate: select bj1_0.id,bj1_0.author_id,bj1_0.chunk_size,bj1_0.created_at,bj1_0.debounce_duration_in_ms,bj1_0.debounce_max_wait_time_in_ms,bj1_0.debouncing_key,bj1_0.hidden,bj1_0.job_character,bj1_0.last_debouncing_event,bj1_0.max_per_job_concurrency,bj1_0.params,bj1_0.project_id,bj1_0.status,bj1_0.target,bj1_0.total_chunks,bj1_0.total_items,bj1_0.type,bj1_0.updated_at from tolgee_batch_job bj1_0 where bj1_0.id in (?) and bj1_0.status in (?,?,?,?) and bj1_0.updated_at<?
Hibernate: select bj1_0.id,bj1_0.author_id,bj1_0.chunk_size,bj1_0.created_at,bj1_0.debounce_duration_in_ms,bj1_0.debounce_max_wait_time_in_ms,bj1_0.debouncing_key,bj1_0.hidden,bj1_0.job_character,bj1_0.last_debouncing_event,bj1_0.max_per_job_concurrency,bj1_0.params,bj1_0.project_id,bj1_0.status,bj1_0.target,bj1_0.total_chunks,bj1_0.total_items,bj1_0.type,bj1_0.updated_at from tolgee_batch_job bj1_0 join tolgee_batch_job_chunk_execution bjce1_0 on bjce1_0.batch_job_id=bj1_0.id where bj1_0.id in (?) group by bj1_0.id having max(bjce1_0.updated_at)<?
2023-12-20T12:28:42.489+01:00 WARN 46977 --- [essageBroker-28] o.h.engine.jdbc.spi.SqlExceptionHelper : SQL Error: 0, SQLState: 53200
2023-12-20T12:28:42.489+01:00 ERROR 46977 --- [essageBroker-28] o.h.engine.jdbc.spi.SqlExceptionHelper : Ran out of memory retrieving query results.
2023-12-20T12:28:42.669+01:00 ERROR 46977 --- [essageBroker-28] o.s.s.s.TaskUtils$LoggingErrorHandler : Unexpected error occurred in scheduled task
org.springframework.orm.jpa.JpaSystemException: JDBC exception executing SQL [select bj1_0.id,bj1_0.author_id,bj1_0.chunk_size,bj1_0.created_at,bj1_0.debounce_duration_in_ms,bj1_0.debounce_max_wait_time_in_ms,bj1_0.debouncing_key,bj1_0.hidden,bj1_0.job_character,bj1_0.last_debouncing_event,bj1_0.max_per_job_concurrency,bj1_0.params,bj1_0.project_id,bj1_0.status,bj1_0.target,bj1_0.total_chunks,bj1_0.total_items,bj1_0.type,bj1_0.updated_at from tolgee_batch_job bj1_0 join tolgee_batch_job_chunk_execution bjce1_0 on bjce1_0.batch_job_id=bj1_0.id where bj1_0.id in (?) group by bj1_0.id having max(bjce1_0.updated_at)<?] [Ran out of memory retrieving query results.] [n/a]
at org.springframework.orm.jpa.vendor.HibernateJpaDialect.convertHibernateAccessException(HibernateJpaDialect.java:320) ~[spring-orm-6.0.13.jar!/:6.0.13]
at org.springframework.orm.jpa.vendor.HibernateJpaDialect.translateExceptionIfPossible(HibernateJpaDialect.java:229) ~[spring-orm-6.0.13.jar!/:6.0.13]
at org.springframework.orm.jpa.AbstractEntityManagerFactoryBean.translateExceptionIfPossible(AbstractEntityManagerFactoryBean.java:550) ~[spring-orm-6.0.13.jar!/:6.0.13]
at org.springframework.dao.support.ChainedPersistenceExceptionTranslator.translateExceptionIfPossible(ChainedPersistenceExceptionTranslator.java:61) ~[spring-tx-6.0.13.jar!/:6.0.13]
at org.springframework.dao.support.DataAccessUtils.translateIfNecessary(DataAccessUtils.java:243) ~[spring-tx-6.0.13.jar!/:6.0.13]
at org.springframework.dao.support.PersistenceExceptionTranslationInterceptor.invoke(PersistenceExceptionTranslationInterceptor.java:152) ~[spring-tx-6.0.13.jar!/:6.0.13]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184) ~[spring-aop-6.0.13.jar!/:6.0.13]
at org.springframework.data.jpa.repository.support.CrudMethodMetadataPostProcessor$CrudMethodMetadataPopulatingMethodInterceptor.invoke(CrudMethodMetadataPostProcessor.java:135) ~[spring-data-jpa-3.1.5.jar!/:3.1.5]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184) ~[spring-aop-6.0.13.jar!/:6.0.13]
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97) ~[spring-aop-6.0.13.jar!/:6.0.13]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184) ~[spring-aop-6.0.13.jar!/:6.0.13]
at org.springframework.data.repository.core.support.MethodInvocationValidator.invoke(MethodInvocationValidator.java:94) ~[spring-data-commons-3.1.5.jar!/:3.1.5]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184) ~[spring-aop-6.0.13.jar!/:6.0.13]
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:244) ~[spring-aop-6.0.13.jar!/:6.0.13]
at jdk.proxy2/jdk.proxy2.$Proxy293.getStuckJobs(Unknown Source) ~[na:na]
at io.tolgee.batch.BatchJobService.getStuckJobs(BatchJobService.kt:401) ~[data-local.jar!/:na]
at jdk.internal.reflect.GeneratedMethodAccessor221.invoke(Unknown Source) ~[na:na]
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[na:na]
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343) ~[spring-aop-6.0.13.jar!/:6.0.13]
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:699) ~[spring-aop-6.0.13.jar!/:6.0.13]
at io.tolgee.batch.BatchJobService$$SpringCGLIB$$1.getStuckJobs(<generated>) ~[data-local.jar!/:na]
at io.tolgee.batch.ScheduledJobCleaner.handleStuckJobs(ScheduledJobCleaner.kt:33) ~[data-local.jar!/:na]
at io.tolgee.batch.ScheduledJobCleaner.cleanup(ScheduledJobCleaner.kt:28) ~[data-local.jar!/:na]
at jdk.internal.reflect.GeneratedMethodAccessor219.invoke(Unknown Source) ~[na:na]
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[na:na]
at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84) ~[spring-context-6.0.13.jar!/:6.0.13]
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) ~[spring-context-6.0.13.jar!/:6.0.13]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[na:na]
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[na:na]
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:840) ~[na:na]
Caused by: org.hibernate.exception.GenericJDBCException: JDBC exception executing SQL [select bj1_0.id,bj1_0.author_id,bj1_0.chunk_size,bj1_0.created_at,bj1_0.debounce_duration_in_ms,bj1_0.debounce_max_wait_time_in_ms,bj1_0.debouncing_key,bj1_0.hidden,bj1_0.job_character,bj1_0.last_debouncing_event,bj1_0.max_per_job_concurrency,bj1_0.params,bj1_0.project_id,bj1_0.status,bj1_0.target,bj1_0.total_chunks,bj1_0.total_items,bj1_0.type,bj1_0.updated_at from tolgee_batch_job bj1_0 join tolgee_batch_job_chunk_execution bjce1_0 on bjce1_0.batch_job_id=bj1_0.id where bj1_0.id in (?) group by bj1_0.id having max(bjce1_0.updated_at)<?] [Ran out of memory retrieving query results.] [n/a]
at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:63) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:108) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:94) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
at org.hibernate.sql.results.jdbc.internal.DeferredResultSetAccess.executeQuery(DeferredResultSetAccess.java:258) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
at org.hibernate.sql.results.jdbc.internal.DeferredResultSetAccess.getResultSet(DeferredResultSetAccess.java:164) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
at org.hibernate.sql.results.jdbc.internal.JdbcValuesResultSetImpl.advanceNext(JdbcValuesResultSetImpl.java:218) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
at org.hibernate.sql.results.jdbc.internal.JdbcValuesResultSetImpl.processNext(JdbcValuesResultSetImpl.java:98) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
at org.hibernate.sql.results.jdbc.internal.AbstractJdbcValues.next(AbstractJdbcValues.java:19) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
at org.hibernate.sql.results.internal.RowProcessingStateStandardImpl.next(RowProcessingStateStandardImpl.java:66) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
at org.hibernate.sql.results.spi.ListResultsConsumer.consume(ListResultsConsumer.java:180) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
at org.hibernate.sql.results.spi.ListResultsConsumer.consume(ListResultsConsumer.java:33) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
at org.hibernate.sql.exec.internal.JdbcSelectExecutorStandardImpl.doExecuteQuery(JdbcSelectExecutorStandardImpl.java:361) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
at org.hibernate.sql.exec.internal.JdbcSelectExecutorStandardImpl.executeQuery(JdbcSelectExecutorStandardImpl.java:168) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
at org.hibernate.sql.exec.internal.JdbcSelectExecutorStandardImpl.list(JdbcSelectExecutorStandardImpl.java:93) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
at org.hibernate.sql.exec.spi.JdbcSelectExecutor.list(JdbcSelectExecutor.java:31) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
at org.hibernate.query.sqm.internal.ConcreteSqmSelectQueryPlan.lambda$new$0(ConcreteSqmSelectQueryPlan.java:109) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
at org.hibernate.query.sqm.internal.ConcreteSqmSelectQueryPlan.withCacheableSqmInterpretation(ConcreteSqmSelectQueryPlan.java:305) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
at org.hibernate.query.sqm.internal.ConcreteSqmSelectQueryPlan.performList(ConcreteSqmSelectQueryPlan.java:246) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
at org.hibernate.query.sqm.internal.QuerySqmImpl.doList(QuerySqmImpl.java:509) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
at org.hibernate.query.spi.AbstractSelectionQuery.list(AbstractSelectionQuery.java:427) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
at org.hibernate.query.Query.getResultList(Query.java:120) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
at jdk.internal.reflect.GeneratedMethodAccessor166.invoke(Unknown Source) ~[na:na]
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[na:na]
at org.springframework.orm.jpa.SharedEntityManagerCreator$DeferredQueryInvocationHandler.invoke(SharedEntityManagerCreator.java:405) ~[spring-orm-6.0.13.jar!/:6.0.13]
at jdk.proxy2/jdk.proxy2.$Proxy332.getResultList(Unknown Source) ~[na:na]
at org.springframework.data.jpa.repository.query.JpaQueryExecution$CollectionExecution.doExecute(JpaQueryExecution.java:129) ~[spring-data-jpa-3.1.5.jar!/:3.1.5]
at org.springframework.data.jpa.repository.query.JpaQueryExecution.execute(JpaQueryExecution.java:92) ~[spring-data-jpa-3.1.5.jar!/:3.1.5]
at org.springframework.data.jpa.repository.query.AbstractJpaQuery.doExecute(AbstractJpaQuery.java:149) ~[spring-data-jpa-3.1.5.jar!/:3.1.5]
at org.springframework.data.jpa.repository.query.AbstractJpaQuery.execute(AbstractJpaQuery.java:137) ~[spring-data-jpa-3.1.5.jar!/:3.1.5]
at org.springframework.data.repository.core.support.RepositoryMethodInvoker.doInvoke(RepositoryMethodInvoker.java:136) ~[spring-data-commons-3.1.5.jar!/:3.1.5]
at org.springframework.data.repository.core.support.RepositoryMethodInvoker.invoke(RepositoryMethodInvoker.java:120) ~[spring-data-commons-3.1.5.jar!/:3.1.5]
at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.doInvoke(QueryExecutorMethodInterceptor.java:164) ~[spring-data-commons-3.1.5.jar!/:3.1.5]
at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.invoke(QueryExecutorMethodInterceptor.java:143) ~[spring-data-commons-3.1.5.jar!/:3.1.5]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184) ~[spring-aop-6.0.13.jar!/:6.0.13]
at org.springframework.data.projection.DefaultMethodInvokingMethodInterceptor.invoke(DefaultMethodInvokingMethodInterceptor.java:72) ~[spring-data-commons-3.1.5.jar!/:3.1.5]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184) ~[spring-aop-6.0.13.jar!/:6.0.13]
at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:123) ~[spring-tx-6.0.13.jar!/:6.0.13]
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:391) ~[spring-tx-6.0.13.jar!/:6.0.13]
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119) ~[spring-tx-6.0.13.jar!/:6.0.13]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:184) ~[spring-aop-6.0.13.jar!/:6.0.13]
at org.springframework.dao.support.PersistenceExceptionTranslationInterceptor.invoke(PersistenceExceptionTranslationInterceptor.java:137) ~[spring-tx-6.0.13.jar!/:6.0.13]
... 29 common frames omitted
Caused by: org.postgresql.util.PSQLException: Ran out of memory retrieving query results.
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2372) ~[postgresql-42.6.0.jar!/:42.6.0]
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:368) ~[postgresql-42.6.0.jar!/:42.6.0]
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:498) ~[postgresql-42.6.0.jar!/:42.6.0]
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:415) ~[postgresql-42.6.0.jar!/:42.6.0]
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:190) ~[postgresql-42.6.0.jar!/:42.6.0]
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:134) ~[postgresql-42.6.0.jar!/:42.6.0]
at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52) ~[HikariCP-5.0.1.jar!/:na]
at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java) ~[HikariCP-5.0.1.jar!/:na]
at org.hibernate.sql.results.jdbc.internal.DeferredResultSetAccess.executeQuery(DeferredResultSetAccess.java:240) ~[hibernate-core-6.3.1.Final.jar!/:6.3.1.Final]
... 67 common frames omitted
Caused by: java.lang.OutOfMemoryError: Java heap space
This is what it looks like in Eclipse MAT:
This is the stack trace from Eclipse MAT:
DefaultDispatcher-worker-1
at sun.nio.ch.SocketDispatcher.read0(Ljava/io/FileDescriptor;JI)I (SocketDispatcher.java(Native Method))
at sun.nio.ch.SocketDispatcher.read(Ljava/io/FileDescriptor;JI)I (SocketDispatcher.java:47)
at sun.nio.ch.NioSocketImpl.tryRead(Ljava/io/FileDescriptor;[BII)I (NioSocketImpl.java:266)
at sun.nio.ch.NioSocketImpl.implRead([BII)I (NioSocketImpl.java:317)
at sun.nio.ch.NioSocketImpl.read([BII)I (NioSocketImpl.java:355)
at sun.nio.ch.NioSocketImpl$1.read([BII)I (NioSocketImpl.java:808)
at java.net.Socket$SocketInputStream.read([BII)I (Socket.java:966)
at java.io.InputStream.skip(J)J (InputStream.java:547)
at org.postgresql.core.VisibleBufferedInputStream.skip(J)J (VisibleBufferedInputStream.java:278)
at org.postgresql.core.PGStream.skip(I)V (PGStream.java:669)
at org.postgresql.core.PGStream.receiveTupleV3()Lorg/postgresql/core/Tuple; (PGStream.java:621)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(Lorg/postgresql/core/ResultHandler;IZ)V (QueryExecutorImpl.java:2368)
at org.postgresql.core.v3.QueryExecutorImpl.execute(Lorg/postgresql/core/Query;Lorg/postgresql/core/ParameterList;Lorg/postgresql/core/ResultHandler;IIIZ)V (QueryExecutorImpl.java:368)
at org.postgresql.jdbc.PgStatement.executeInternal(Lorg/postgresql/core/CachedQuery;Lorg/postgresql/core/ParameterList;I)V (PgStatement.java:498)
at org.postgresql.jdbc.PgStatement.execute(Lorg/postgresql/core/CachedQuery;Lorg/postgresql/core/ParameterList;I)V (PgStatement.java:415)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(I)Z (PgPreparedStatement.java:190)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery()Ljava/sql/ResultSet; (PgPreparedStatement.java:134)
at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery()Ljava/sql/ResultSet; (ProxyPreparedStatement.java:52)
at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery()Ljava/sql/ResultSet; (HikariProxyPreparedStatement.java)
at org.hibernate.sql.results.jdbc.internal.DeferredResultSetAccess.executeQuery()V (DeferredResultSetAccess.java:240)
at org.hibernate.sql.results.jdbc.internal.DeferredResultSetAccess.getResultSet()Ljava/sql/ResultSet; (DeferredResultSetAccess.java:164)
at org.hibernate.sql.results.jdbc.internal.JdbcValuesResultSetImpl.advanceNext()Z (JdbcValuesResultSetImpl.java:218)
at org.hibernate.sql.results.jdbc.internal.JdbcValuesResultSetImpl.processNext(Lorg/hibernate/sql/results/jdbc/spi/RowProcessingState;)Z (JdbcValuesResultSetImpl.java:98)
at org.hibernate.sql.results.jdbc.internal.AbstractJdbcValues.next(Lorg/hibernate/sql/results/jdbc/spi/RowProcessingState;)Z (AbstractJdbcValues.java:19)
at org.hibernate.sql.results.internal.RowProcessingStateStandardImpl.next()Z (RowProcessingStateStandardImpl.java:66)
at org.hibernate.sql.results.spi.ListResultsConsumer.consume(Lorg/hibernate/sql/results/jdbc/spi/JdbcValues;Lorg/hibernate/engine/spi/SharedSessionContractImplementor;Lorg/hibernate/sql/results/jdbc/spi/JdbcValuesSourceProcessingOptions;Lorg/hibernate/sql/results/jdbc/internal/JdbcValuesSourceProcessingStateStandardImpl;Lorg/hibernate/sql/results/internal/RowProcessingStateStandardImpl;Lorg/hibernate/sql/results/spi/RowReader;)Ljava/util/List; (ListResultsConsumer.java:200)
at org.hibernate.sql.results.spi.ListResultsConsumer.consume(Lorg/hibernate/sql/results/jdbc/spi/JdbcValues;Lorg/hibernate/engine/spi/SharedSessionContractImplementor;Lorg/hibernate/sql/results/jdbc/spi/JdbcValuesSourceProcessingOptions;Lorg/hibernate/sql/results/jdbc/internal/JdbcValuesSourceProcessingStateStandardImpl;Lorg/hibernate/sql/results/internal/RowProcessingStateStandardImpl;Lorg/hibernate/sql/results/spi/RowReader;)Ljava/lang/Object; (ListResultsConsumer.java:33)
at org.hibernate.sql.exec.internal.JdbcSelectExecutorStandardImpl.doExecuteQuery(Lorg/hibernate/sql/exec/spi/JdbcOperationQuerySelect;Lorg/hibernate/sql/exec/spi/JdbcParameterBindings;Lorg/hibernate/sql/exec/spi/ExecutionContext;Lorg/hibernate/sql/results/spi/RowTransformer;Ljava/lang/Class;Ljava/util/function/Function;Lorg/hibernate/sql/results/spi/ResultsConsumer;)Ljava/lang/Object; (JdbcSelectExecutorStandardImpl.java:361)
at org.hibernate.sql.exec.internal.JdbcSelectExecutorStandardImpl.executeQuery(Lorg/hibernate/sql/exec/spi/JdbcOperationQuerySelect;Lorg/hibernate/sql/exec/spi/JdbcParameterBindings;Lorg/hibernate/sql/exec/spi/ExecutionContext;Lorg/hibernate/sql/results/spi/RowTransformer;Ljava/lang/Class;Ljava/util/function/Function;Lorg/hibernate/sql/results/spi/ResultsConsumer;)Ljava/lang/Object; (JdbcSelectExecutorStandardImpl.java:168)
at org.hibernate.sql.exec.internal.JdbcSelectExecutorStandardImpl.list(Lorg/hibernate/sql/exec/spi/JdbcOperationQuerySelect;Lorg/hibernate/sql/exec/spi/JdbcParameterBindings;Lorg/hibernate/sql/exec/spi/ExecutionContext;Lorg/hibernate/sql/results/spi/RowTransformer;Ljava/lang/Class;Lorg/hibernate/sql/results/spi/ListResultsConsumer$UniqueSemantic;)Ljava/util/List; (JdbcSelectExecutorStandardImpl.java:93)
at org.hibernate.sql.exec.spi.JdbcSelectExecutor.list(Lorg/hibernate/sql/exec/spi/JdbcOperationQuerySelect;Lorg/hibernate/sql/exec/spi/JdbcParameterBindings;Lorg/hibernate/sql/exec/spi/ExecutionContext;Lorg/hibernate/sql/results/spi/RowTransformer;Lorg/hibernate/sql/results/spi/ListResultsConsumer$UniqueSemantic;)Ljava/util/List; (JdbcSelectExecutor.java:31)
at org.hibernate.query.sqm.internal.ConcreteSqmSelectQueryPlan.lambda$new$0(Ljava/lang/String;Lorg/hibernate/sql/results/spi/ListResultsConsumer$UniqueSemantic;Lorg/hibernate/query/sqm/internal/DomainParameterXref;Ljava/lang/Void;Lorg/hibernate/query/spi/DomainQueryExecutionContext;Lorg/hibernate/query/sqm/internal/ConcreteSqmSelectQueryPlan$CacheableSqmInterpretation;Lorg/hibernate/sql/exec/spi/JdbcParameterBindings;)Ljava/util/List; (ConcreteSqmSelectQueryPlan.java:109)
at org.hibernate.query.sqm.internal.ConcreteSqmSelectQueryPlan$$Lambda$2279+0x00007f9cd91c66a0.interpret(Ljava/lang/Object;Lorg/hibernate/query/spi/DomainQueryExecutionContext;Lorg/hibernate/query/sqm/internal/ConcreteSqmSelectQueryPlan$CacheableSqmInterpretation;Lorg/hibernate/sql/exec/spi/JdbcParameterBindings;)Ljava/lang/Object; ()
at org.hibernate.query.sqm.internal.ConcreteSqmSelectQueryPlan.withCacheableSqmInterpretation(Lorg/hibernate/query/spi/DomainQueryExecutionContext;Ljava/lang/Object;Lorg/hibernate/query/sqm/internal/ConcreteSqmSelectQueryPlan$SqmInterpreter;)Ljava/lang/Object; (ConcreteSqmSelectQueryPlan.java:305)
at org.hibernate.query.sqm.internal.ConcreteSqmSelectQueryPlan.performList(Lorg/hibernate/query/spi/DomainQueryExecutionContext;)Ljava/util/List; (ConcreteSqmSelectQueryPlan.java:246)
at org.hibernate.query.sqm.internal.QuerySqmImpl.doList()Ljava/util/List; (QuerySqmImpl.java:509)
at org.hibernate.query.spi.AbstractSelectionQuery.list()Ljava/util/List; (AbstractSelectionQuery.java:427)
at org.hibernate.query.Query.getResultList()Ljava/util/List; (Query.java:120)
at jdk.internal.reflect.GeneratedMethodAccessor166.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; ()
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Method.java:568)
at org.springframework.orm.jpa.SharedEntityManagerCreator$DeferredQueryInvocationHandler.invoke(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object; (SharedEntityManagerCreator.java:405)
at jdk.proxy2.$Proxy332.getResultList()Ljava/util/List; ()
at io.tolgee.batch.BatchJobService.getAllUnlockedChunksForJobs(Ljava/lang/Iterable;)Ljava/util/List; (BatchJobService.kt:308)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (NativeMethodAccessorImpl.java(Native Method))
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (NativeMethodAccessorImpl.java:77)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Method.java:568)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object; (AopUtils.java:343)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;Lorg/springframework/cglib/proxy/MethodProxy;)Ljava/lang/Object; (CglibAopProxy.java:699)
at io.tolgee.batch.BatchJobService$$SpringCGLIB$$1.getAllUnlockedChunksForJobs(Ljava/lang/Iterable;)Ljava/util/List; (<generated>)
at io.tolgee.batch.BatchJobProjectLockingManager.getInitialJobId(J)Ljava/lang/Long; (BatchJobProjectLockingManager.kt:118)
at io.tolgee.batch.BatchJobProjectLockingManager.computeFnBody(Lio/tolgee/batch/data/BatchJobDto;Ljava/lang/Long;)J (BatchJobProjectLockingManager.kt:99)
at io.tolgee.batch.BatchJobProjectLockingManager.access$computeFnBody(Lio/tolgee/batch/BatchJobProjectLockingManager;Lio/tolgee/batch/data/BatchJobDto;Ljava/lang/Long;)J (BatchJobProjectLockingManager.kt:19)
at io.tolgee.batch.BatchJobProjectLockingManager$tryLockLocal$computed$1.invoke(Ljava/lang/Long;Ljava/lang/Long;)Ljava/lang/Long; (BatchJobProjectLockingManager.kt:81)
at io.tolgee.batch.BatchJobProjectLockingManager$tryLockLocal$computed$1.invoke(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (BatchJobProjectLockingManager.kt:80)
at io.tolgee.batch.BatchJobProjectLockingManager.tryLockLocal$lambda$2(Lkotlin/jvm/functions/Function2;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Long; (BatchJobProjectLockingManager.kt:80)
at io.tolgee.batch.BatchJobProjectLockingManager$$Lambda$2805+0x00007f9cd93af8e8.apply(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; ()
at java.util.concurrent.ConcurrentHashMap.compute(Ljava/lang/Object;Ljava/util/function/BiFunction;)Ljava/lang/Object; (ConcurrentHashMap.java:1916)
at io.tolgee.batch.BatchJobProjectLockingManager.tryLockLocal(Lio/tolgee/batch/data/BatchJobDto;)Z (BatchJobProjectLockingManager.kt:80)
at io.tolgee.batch.BatchJobProjectLockingManager.tryLockJobForProject(Lio/tolgee/batch/data/BatchJobDto;)Z (BatchJobProjectLockingManager.kt:42)
at io.tolgee.batch.BatchJobProjectLockingManager.canRunBatchJobOfExecution(J)Z (BatchJobProjectLockingManager.kt:34)
at io.tolgee.batch.BatchJobConcurrentLauncher.handleItem(Lkotlinx/coroutines/CoroutineScope;Lio/tolgee/batch/data/ExecutionQueueItem;Lkotlin/jvm/functions/Function2;)Z (BatchJobConcurrentLauncher.kt:186)
at io.tolgee.batch.BatchJobConcurrentLauncher.access$handleItem(Lio/tolgee/batch/BatchJobConcurrentLauncher;Lkotlinx/coroutines/CoroutineScope;Lio/tolgee/batch/data/ExecutionQueueItem;Lkotlin/jvm/functions/Function2;)Z (BatchJobConcurrentLauncher.kt:25)
at io.tolgee.batch.BatchJobConcurrentLauncher$run$1$1.invoke()Ljava/lang/Boolean; (BatchJobConcurrentLauncher.kt:122)
at io.tolgee.batch.BatchJobConcurrentLauncher$run$1$1.invoke()Ljava/lang/Object; (BatchJobConcurrentLauncher.kt:104)
at io.tolgee.batch.BatchJobConcurrentLauncher.repeatForever(Lkotlin/jvm/functions/Function0;)V (BatchJobConcurrentLauncher.kt:82)
at io.tolgee.batch.BatchJobConcurrentLauncher$run$1.invokeSuspend(Ljava/lang/Object;)Ljava/lang/Object; (BatchJobConcurrentLauncher.kt:104)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(Ljava/lang/Object;)V (ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run()V (DispatchedTask.kt:106)
at kotlinx.coroutines.internal.LimitedDispatcher.run()V (LimitedDispatcher.kt:42)
at kotlinx.coroutines.scheduling.TaskImpl.run()V (Tasks.kt:95)
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(Lkotlinx/coroutines/scheduling/Task;)V (CoroutineScheduler.kt:570)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(Lkotlinx/coroutines/scheduling/Task;)V (CoroutineScheduler.kt:750)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker()V (CoroutineScheduler.kt:677)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run()V (CoroutineScheduler.kt:664)
OK. I've got the fix. There were multiple places where we were fetching chunk executions including the fetched batch job. Since the batch job has large target, fetching it 5000 times makes the response quite big. Thanks for reporting guys! ๐ I hope this really finally fixes that. ๐ค
@JanCizmar Thanks for your efforts in fixing this issue and happy holidays! :fireworks:
@JanCizmar Just wanted to report that we do not encounter any problems with the latest release. Thanks for fixing the issue.
Great news!
Tolgee runs out of heap space when importing a couple of thousand translation keys. I think this is the same issue as described in #1930.
I think this is caused by Tolgee loading it's entire job queue in to memory. This includes entries it is not currently processing (i.e. the ones which are queued). I haven't investigated further yet but for some reason these job queue entries are fairly large in memory, 1MB+. This means when importing a translation file with for example 3000 entries Tolgee will run out of heap space even when running with a 2GB heap. Once the job queue is saturated Tulgee will never recover from it without manual intervention because on every reboot it will load the job queue and go out of heap space.
During testing we did not have these issues with the Docker image. I have a hunch this is related to the machine translation (which is not enabled in the Docker image). But I haven't investigated this yet.
After making a heap dump and importing it in the Eclipse Memory analyzer one can see the heap consists almost solely out of PostgreSQL tuples:
This is the stacktrace associated with these objects:
This is what the job queue looks like:
This originates from the Tolgee job queue code:
https://github.com/tolgee/tolgee-platform/blob/fa9d12497a0cba57b70cc781ee68cfd6f70cc9f1/backend/data/src/main/kotlin/io/tolgee/batch/BatchJobChunkExecutionQueue.kt#L48-L68
A possible fix would be to have Tolgee retrieve batches (of for example 10 or 20 entries) instead of retrieving the entire queue.