scullxbones / akka-persistence-mongo

Implementation of akka-persistence storage plugins for mongodb
Apache License 2.0
103 stars 55 forks source link

All journals are being queried when recovering from snapshot #117

Closed jamesdam closed 8 years ago

jamesdam commented 8 years ago

We are seeing this strange mongo query when recovering from snapshots

{  
   pid:"80-844",
   to:{  
      $gte:1
   },
   from:{  
      $lte:101568
   }
}

It takes quite long for the query to finish when the number of events is big.

I suspect it is because the function maxSequenceNr (of both RxMongoJournaller and CasbahPersistenceJournaller ) does not take from into consideration.

Also, should we add a limit 1 to the query?

scullxbones commented 8 years ago

Hi @jamesdam, thanks for the report!

Can you give me a minimal duplicating case? In other words, some code to help me reproduce the problem.

Can you also confirm that snapshots exist in the snapshot collection for this particular pid? Akka persistence should only go to the journal for events beyond the latest snapshot, so if the code is running this query on recovery that suggests that no snapshot exists for 80-844.

I suspect it is because the function maxSequenceNr (of both RxMongoJournaller and CasbahPersistenceJournaller ) does not take from into consideration.

That's OK because the max will be to by definition. There are indexes supporting this query, in fact I want to say the query should be fully covered by an index.

Also, should we add a limit 1 to the query?

I see we're doing a headOption, I suppose limit 1 may help - but I wouldn't expect it to help too much

jamesdam commented 8 years ago

Hi @scullxbones , thanks for helping!

Sorry, I forgot that there and index for the query. After digging deeper into the service log, I found this

Failed to persist event type [xxxx] with sequence number [101569] for persistenceId [80-844].
akka.pattern.CircuitBreakerOpenException: Circuit Breaker is open; calls are failing fast

together with a lot of other replay failures at the same time, I think it's just a network issue. The cluster is working fine now so I will close this.

Thank you very much for your help!

scullxbones commented 8 years ago

Sure thing, happy to help. Let me know if anything else comes up