kclay / rethink-scala

Scala Driver for RethinkDB
Other
100 stars 24 forks source link

IndexOutOfBoundsException when iterating over async result cursor #42

Open gmethvin opened 9 years ago

gmethvin commented 9 years ago

I get the following error when iterating over a result cursor:

java.lang.IndexOutOfBoundsException: 996
    at scala.collection.mutable.ResizableArray$class.apply(ResizableArray.scala:43) ~[scala-library-2.11.6.jar:na]
    at scala.collection.mutable.ArrayBuffer.apply(ArrayBuffer.scala:48) ~[scala-library-2.11.6.jar:na]
    at com.rethinkscala.net.RethinkIterator.next(DefaultCursor.scala:29) ~[core_2.11-0.4.7.jar:0.4.7]
    at scala.collection.Iterator$class.foreach(Iterator.scala:750) ~[scala-library-2.11.6.jar:na]
    at com.rethinkscala.net.RethinkIterator.foreach(DefaultCursor.scala:9) ~[core_2.11-0.4.7.jar:0.4.7]
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) ~[scala-library-2.11.6.jar:na]
    at com.rethinkscala.net.DefaultCursor.foreach(DefaultCursor.scala:94) ~[core_2.11-0.4.7.jar:0.4.7]
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) ~[scala-library-2.11.6.jar:na]
    at com.rethinkscala.net.DefaultCursor.map(DefaultCursor.scala:94) ~[core_2.11-0.4.7.jar:0.4.7]

My code looks something like:

emailsTable.filter(f => (f \ "userId") === userId).map(f => (f \ "messageId").string).run.map { ids: Seq[String] =>
  val messageIds = ids.map(MessageId(_)).toSet
  // ...
}

It seems like basically anything that iterates over the result cursor (map, toSet, etc.) has the potential to cause this error for me. I'm also using the async connection, so do we know that https://github.com/kclay/rethink-scala/blob/master/core/src/main/scala/com/rethinkscala/net/DefaultCursor.scala#L26 will complete before indexing into the chunks array? It's hard to tell from the code.

gmethvin commented 9 years ago

I'm noticing this on basically any large table I have, and I've found that sometimes adding log statements will prevent this from happening. So my guess is there's some kind of race condition.

kclay commented 9 years ago

Still looking into this one.

gmethvin commented 9 years ago

Also, am I correct that the result cursor blocks to get more results from the database, even with the async API? It'd be ideal to have an API that returns an Enumerator instead.

kclay commented 9 years ago

In some cases rethinkdb will chunk up your results. Lets say you have a table with 1k rows but each row is say 5kb. Then rethinkdb would return all 1k rows at once.

Now lets say you have the same number of rows but each row is 15kb. Rethinkdb would then chunk up the results , so on first fetch return 100 rerows , next fetch 100 rows and so on. This is were the custom Iterator comes into play in the driver, it tries to fetch these rows since rethinkdb didn't return all the results on initial fetch. From asking in the rethinkdb irc channel this is the expected outcome in these cases and the official drivers handles this. So there is a bug in how the Async driver works (you are the second person that had this same issuse and both were with async). In the blocking mode this error doesnt' seem to happen so it may way be a race condition.

As for the use of an Enumerator. Could you take that request as well as any other Play request and create a "Play Support" issues for it. Along with all the features you would like to have in the driver for play, as well as some usecases. I know you asked for play-json support but I do have some issues with trying to support Group/Ungroup and Datetime serialization since rethinkdb wraps these objects in an nested object with some metadata.

gmethvin commented 9 years ago

@kclay Using iteratees isn't really about "Play support" to me. The point is that I'm able to enumerate the data asynchronously. It doesn't matter that much to me if you use Play's iteratee library or some other iteratee or reactive stream implementation.

kclay commented 9 years ago

@gmethvin can you provide a test case for this, I'm having a hard time getting this to reproduce. I know the bug is there but can't create a case to show it.

gmethvin commented 9 years ago

@kclay I haven't had time to write a complete test case for this. This only happens for me when there are on the order of 10000 elements in the result set, and then only occasionally. I'm not sure what the trick is exactly to causing the bug.