linagora / tmail-backend

GNU Affero General Public License v3.0
30 stars 17 forks source link

POC Event bus notifications using Postgres Notify/Listen #1104

Closed quantranhong1999 closed 1 week ago

quantranhong1999 commented 2 weeks ago

State: pass contract tests

TODO: Guice binding and aggregate a Docker image and performance test.

However, there is a scale issue with Postgres NOTIFY/LISTEN: 1 subscriber uses 1 dedicated connection -> we have a problem scaling the topic number e.g. a few hundred mailboxes would overload the default maximum 100 connections of Postgres (not saying about the pool max connections yet).

Would you like to continue with the performance test @chibenwa ?

chibenwa commented 2 weeks ago

Would you like to continue with the performance test @chibenwa ?

Does this translate to "one pg connection per event bus"?

If so it is not that bad IMO....

quantranhong1999 commented 2 weeks ago

Does this translate to "one pg connection per event bus"?

Yes. Indeed it is not that bad.

chibenwa commented 2 weeks ago

Would you like to continue with the performance test @chibenwa ?

As explained with the ticket I wish we do not spend to much time on this POC but I am faily interested by the outcome so I am Ok with one man day testing this :-p

quantranhong1999 commented 1 week ago

Overall, the performance is good, not much different from the full RabbitMQ event bus one.

IMAP performance

image

JMAP performance

image

Metrics

postgres_register{quantile="0.5",} 0.001835007
postgres_register{quantile="0.75",} 0.003358719
postgres_register{quantile="0.95",} 0.318767103
postgres_register{quantile="0.98",} 0.406847487
postgres_register{quantile="0.99",} 0.47395635100000005
postgres_register{quantile="0.999",} 0.7381975030000001
postgres_register_count 21083.0

postgres_dispatch{quantile="0.5",} 0.002932735
postgres_dispatch{quantile="0.75",} 0.004292607
postgres_dispatch{quantile="0.95",} 0.011993087000000001
postgres_dispatch{quantile="0.98",} 0.041156607000000005
postgres_dispatch{quantile="0.99",} 0.381681663
postgres_dispatch{quantile="0.999",} 0.809500671
postgres_dispatch_count 228676.0

postgres_unregister{quantile="0.5",} 0.0017121270000000001
postgres_unregister{quantile="0.75",} 0.0023756790000000003
postgres_unregister{quantile="0.95",} 0.007962623
postgres_unregister{quantile="0.98",} 0.022151167000000003
postgres_unregister{quantile="0.99",} 0.041156607000000005
postgres_unregister{quantile="0.999",} 0.33344716700000004
postgres_unregister_count 21083.0

The speed is not as fast as Redis, but OK IMO.

chibenwa commented 1 week ago

p99 hurts IMO (halth a second!)

Might be OK for very small deployments. Deployments for which you might not need several james servers in the first place?

Anyway thanks a lot @quantranhong1999 for conductiong out this experiment. this was instructive.

quantranhong1999 commented 1 week ago

@chibenwa seems to be in favor of that work?

IMO unless we can replace all our queues (mail queues for example) to fully Postgres, then it is worth the merge. With only the event bus notifications using Postgres while mail queues still rely on RabbitMQ, I do not see how it would benefit small deployment (more complicated, fragment while not traded for better performance?).

chibenwa commented 1 week ago

IMO unless we can replace all our queues (mail queues for example) to fully Postgres, then it is worth the merge. With only the event bus notifications using Postgres while mail queues still rely on RabbitMQ, I do not see how it would benefit small deployment (more complicated, fragment while not traded for better performance?).

100% agree.

Arsnael commented 1 week ago

Alright let's stop the experiment here then.

Thanks @quantranhong1999 for taking the patience and time to demonstrate this case :)