mongodb-haskell / mongodb

MongoDB driver for Haskell
http://hackage.haskell.org/package/mongoDB
Apache License 2.0
170 stars 44 forks source link

Issue with OP_MSG when cursor_id overflow 32bits #153

Open pierreMizrahi opened 4 months ago

pierreMizrahi commented 4 months ago

On a database where cursors overflow 32bits, responses that are big enough to trigger the "MoreToCome" mechanism fail with a 'fromJust' Error.

I think that there is a slight misinterpretation of the mongodb OP_MSG doc in the current implementation of the library :

https://github.com/mongodb-haskell/mongodb/blob/163e5c6b199705704e15f85e1e18bccb6522ac3d/Database/MongoDB/Internal/Protocol.hs#L312

id', which is the gets casted as an Int32, whereas the specification says it is an Int64.

https://github.com/mongodb/specifications/blob/39add29e0a3b6721688845874383878b90811a12/source/change-streams/change-streams.md?plain=1#L319

The confusion might come from the fact that this is believed to be a ResponseId (Int32), on which the compatibility with the RequestId is checked, whereas it is actually a CursorId (Int64).

What happens is that this Int64 fails at being casted as an Int32, returning Nothing, and this fails unsafely because of the 'fromJust'.

I don't think it is necessary (nor possible) to check each batch of document. Note that the responseId is actually checked from the header (line 292).

Here is a very minimal patch that solves the issue for me: just remove the id check altogether:

diff --git a/Database/MongoDB/Internal/Protocol.hs b/Database/MongoDB/Internal/Protocol.hs
index dbf7630..08758ab 100644
--- a/Database/MongoDB/Internal/Protocol.hs
+++ b/Database/MongoDB/Internal/Protocol.hs
@@ -309,8 +309,7 @@ callOpMsg pipe request flagBit params = do
                         let (docs, docs') =
                               ( fromJust $ cast $ valueAt "nextBatch" doc :: [Document]
                               , fromJust $ cast $ valueAt "nextBatch" doc' :: [Document])
-                            id' = fromJust $ cast $ valueAt "id" doc' :: Int32
-                        (rt, check id' (rt, rep{ sections = docs' ++ docs })) -- todo: avoid (++)
+                        (rt, rep{sections= docs' ++ docs}) -- todo: avoid (++)
                         -- Since we use this to process moreToCome messages, we
                         -- know that there will be a nextBatch key in the document
                       _ ->  error "Impossible"

Please note that I haven't tried to make a simple reproducer outside of my production database, and I don't really expect it to be that easy.