okumin / akka-persistence-sql-async

A journal and snapshot store plugin for akka-persistence using RDBMS.
Apache License 2.0
115 stars 26 forks source link

Metadata and Journal not being written atomically #15

Closed ColinScott closed 8 years ago

ColinScott commented 8 years ago

I've encountered an issue where entries are being written to the metadata table but no corresponding entry is made in the journal table. I'm not sure why the journal write is failing specifically but this seems to happen under (not particularly high) load when I start pushing our data into the system. Individual writes seem to complete successfully.

My very initial look at the code suggests that nothing ever calls begin/commit/rollback on TxAsyncDBSession to cause a transaction to actually be created but I'm not familiar enough with the libraries in question to determine if this is the case.

This is occurring with Akka 2.4.2 using Sharding against a PostgreSQL instance running locally in a Docker container.

The errors I'm getting look like:

Persistence failure when replaying events for persistenceId [Item-1234]. Last known sequence number [0] akka.pattern.CircuitBreaker$$anon$1: Circuit Breaker Timed out.

The write failures are also circuit breaker timeouts, the cause of which I haven't yet determined. Regardless it seems that on failure the database is left in an inconsistent state.

okumin commented 8 years ago

@ColinScott

Transactions are committed in this line, so it is expected that metadata and journal are written atomically.

Could you check some logs to confirm that transactions are written correctly? I'm poor at postgresql but commit logs like MySQL's binlog maybe exist.

ColinScott commented 8 years ago

I'll see if I can build a minimal replication case and get some logs, although I'm not a PostgreSQL expert.

ColinScott commented 8 years ago

I've created a project that replicates the issue here: https://github.com/ColinScott/persist-test

I'm hosting the PostgreSQL instance in a docker image based on the official PostgreSQL image. You can run this with docker run --name some-postgres -e POSTGRES_PASSWORD=qazxsw21 -p 5432:5432 -d abstractcode/akka-postgres:9.4.6.

My first test run for this resulted in 7460 entries in the persistence_metadata table but none in the persistence_journal table. I didn't find any useful logs in the docker container running the database after this test (or indeed any PostgreSQL logs at all).

okumin commented 8 years ago

@ColinScott

Thanks for your minimal existence and it makes me realize the cause.

About inconsistency

PersistentActor replays existent a snapshot and journals first after creating. PersistentActor of akka-persistent-sql-async inserts into metadata table on replaying those if its metadata has not yet created.

https://github.com/okumin/akka-persistence-sql-async/blob/master/core/src/main/scala/akka/persistence/snapshot/sqlasync/SQLAsyncSnapshotStore.scala#L15

Something like inconsistency happened due to that, but it is not inconsistency. So you can run com.abstractcode.persisttest.TestMain with few Items(e.g. 1 to 100) on records created on high load.

Tests by TCK uses internal APIs directly, so I have observed transaction boundaries on such an abnormal environments.

About errors

Being related to underlying libraries, followings are guesses……

postgresql-async seems to collapse and becomes non-available once its buffer is overflowed. That buffer is parameterized by configuring wait-queue-capacity parameter.

https://github.com/okumin/akka-persistence-sql-async/blob/master/core/src/test/resources/postgresql-application.conf#L11

However, buffers will be overflowed sometime if too many queries which postgresql cannot handle. So you should take care of high loads.

akka-stream looks good for this problem……

ColinScott commented 8 years ago

Thanks for looking at this.