scullxbones / akka-persistence-mongo

Implementation of akka-persistence storage plugins for mongodb
Apache License 2.0
103 stars 55 forks source link

Sporadic write failure when using persistAsync #22

Closed marcuslinke closed 9 years ago

marcuslinke commented 9 years ago

@scullxbones When using persistAsync to persist the events i get the following failure message sporadically and the actor stops after that. Supervised restart of the actor leads to a ClassCastException in the receiveCommand logic where persistenAsync is called ???. The strange thing about that is it occurs after processing millions of events of the same type. Any idea whats going on here?

WriteMessageFailure(PersistentImpl(HistStoryViewsEvent(Mon Dec 29 00:00:00 CET 2014,2854247,pm,2),32779138,de.story,false,List(),Actor[akka://rabbit-akka-stream/deadLetters]),akka.pattern.CircuitBreakerOpenException: Circuit Breaker is open; calls are failing fast,3)
WriteMessageFailure(PersistentImpl(HistStoryViewsEvent(Mon Dec 29 00:00:00 CET 2014,2856725,pm,1),32779941,de.story,false,List(),Actor[akka://rabbit-akka-stream/deadLetters]),akka.pattern.CircuitBreakerOpenException: Circuit Breaker is open; calls are failing fast,8)

2015-04-15 09:06:20,601 ERROR - restart Actor[akka://rabbit-akka-stream/user/$b/StoryDeCommandsProcessor#573242443] java.lang.ClassCastException: de.na.stats.rabbit.flow.RabbitMessageImpl cannot be cast to de.na.stats.domain.Processor$Event at de.na.stats.domain.story.StoryCommandsProcessor$$anonfun$1$$anonfun$applyOrElse$7.apply(StoryCommandsProcessor.scala:125) ~[stats.stats-0.0.1-SNAPSHOT.jar:0.0.1-SNAPSHOT] at akka.persistence.Eventsourced$$anon$2.aroundReceive(Eventsourced.scala:72) ~[com.typesafe.akka.akka-persistence-experimental_2.10-2.3.9.jar:na] at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:369) ~[com.typesafe.akka.akka-persistence-experimental_2.10-2.3.9.jar:na] at de.na.stats.domain.story.StoryCommandsProcessor.aroundReceive(StoryCommandsProcessor.scala:35) ~[stats.stats-0.0.1-SNAPSHOT.jar:0.0.1-SNAPSHOT] at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) ~[com.typesafe.akka.akka-actor_2.10-2.3.9.jar:na]

scullxbones commented 9 years ago

Hi!

It's hard to say without seeing more code. You do have an open circuit breaker which means that Mongo is erroring out or timing out, so that's one thing to look at.

marcuslinke commented 9 years ago

Thanks for your response. I will investigate in this direction. Closing for now.

marcuslinke commented 9 years ago

@scullxbones Maybe the default wtimeout configuration of 3s is too optimistic for heavy load scenarios? Would it be promising to tune this parameter regarding this error?

I've seen the following in the mongo log. I guess this will block further writes right?

[DataFileSync] flushing mmaps took 18433ms  for 53 files
scullxbones commented 9 years ago

Hmm, not sure. 18s is an eternity, but I don't know if that particular flush is blocking. Have you looked at the https://github.com/scullxbones/akka-persistence-mongo#metrics that are exposed? Those should tell you if/where you're having a problem assuming it's timing related.

marcuslinke commented 9 years ago

It seems under high load mongo is flooded and then journal write times out. This is probably because our mongo resides in a virtual machine with attached SAN where storage latency is unpredictable. After configure

akka.contrib.persistence.mongodb.mongo.journal-wtimeout = 10s
akka.contrib.persistence.mongodb.mongo.breaker.maxTries = 0
akka.contrib.persistence.mongodb.mongo.breaker.timeout.call = 10s

everything seems fine now, but i think it needs a better concept for dealing with high message loads.

The message source is RabbitMQ and currently i don't know how to implement dynamic flow control with it. Thanks for your input anyway!