Closed quantranhong1999 closed 1 year ago
I personally could not reproduce the EmailSubmission/set error on TMail preprod env, but see the error logs DriverTimeoutException
upon mail processing, leading to mail is not actually sent (despite EmailSubmission/set succeeded).
My very first theory: something could be blocking in CachedBlobStore
on the Cassandra driver thread which leads to DriverTimeoutException
.
There seems a lot of InputStream usage (which is blocking), I suspect some of InputStream usage was not correctly subscribed on the BLOCKING_CALL_WRAPPER
thread.
TODO: try to debug in unit tests to see.
Nothing suspicious with CachedBlobStoreTest
.
-> I will try to put all Cassandra queries being executed on Schedulers.parallel()
thread and performance test now to see if any obvious .block
etc is being used.
-> I will try to put all Cassandra queries being executed on Schedulers.parallel() thread and performance test now to see if any obvious .block etc is being used.
FYI I opened https://datastax-oss.atlassian.net/jira/software/c/projects/JAVA/issues/JAVA-3121 to be able to customize the driver thread factory. It would enable to extend the NonBlocking interface from the Reactor project and thus enable rejection of blocking calls made on the Cassandra driver thread (with a bit of tunning).
No answer so far...
So seems no obvious .block
/ .blockLast
... are used.
Now I will use BlockHound to scan more tiny "heavy computing" code around CassandraBlobStoreCache
/ CassandraMessageFastViewProjection
/ EmailSubmissionSetMethodContract
/ IdentitySetContract
.
Well at least I spotted a blocking call on CachedBlobStore's cassandra thread: https://github.com/apache/james-project/pull/1725
I am a bit surprised because the UUID.randomUUID()
has a part that blocks as well, while we seem to use it a lot on either JMAP netty threads or Cassandra thread.
Caused by: reactor.blockhound.BlockingOperationError: Blocking call! java.io.FileInputStream#readBytes
at java.base/java.io.FileInputStream.readBytes(FileInputStream.java)
at java.base/java.io.FileInputStream.read(FileInputStream.java:276)
at java.base/java.io.FilterInputStream.read(FilterInputStream.java:132)
at java.base/sun.security.provider.NativePRNG$RandomIO.readFully(NativePRNG.java:425)
at java.base/sun.security.provider.NativePRNG$RandomIO.ensureBufferValid(NativePRNG.java:528)
at java.base/sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:547)
at java.base/sun.security.provider.NativePRNG.engineNextBytes(NativePRNG.java:221)
at java.base/java.security.SecureRandom.nextBytes(SecureRandom.java:758)
at java.base/java.util.UUID.randomUUID(UUID.java:151)
at org.apache.james.events.Event$EventId.random(Event.java:37)
at org.apache.james.mailbox.store.event.EventFactory$RequireEventId.randomEventId(EventFactory.java:67)
I am not sure it is worth to put the boundedElastic
thread just for UUID.randomUUID()
, should be fast enough IMO?
Maybe it is a mistake but I chose to ignore that one.
Yes it is a blocking call, but it is shitty damn super fast. Fast enough to get a neglictible impact. (exercise: measure it with a JMH mini benchmark :-p see if you can make me change my mind)
I did not manage to come up with a better architecture for generating UIDs and random numbers that performing these blocking calls.
So I admit I did close my eyes until the community puts togeter something like of a rector-random
library...
WHY
Env: TMail Release 0.7.1-rc1 on preprod When sending a mail with Cassandra mail queue view enabled, we have the errors:
From CNB (likely with mailqueue view disabled), error upon EmailSubmission/set:
From TMail preprod (with mail queue view enabled), error upon mail processing:
Seems something is blocking on the Cassandra driver thread, upon the mailqueue view being enabled.
DoD
Investigate and fix the issue.
Notes
Mean while, creating identity is fine (reported by Tung).