mashingan / anonimongo

Another Nim pure Mongo DB driver
MIT License
43 stars 5 forks source link

GridFS testing issues #31

Open samsamros opened 9 months ago

samsamros commented 9 months ago

As promised, I will begin this thread as I try to debug the issues in the GridFS test. I'm also testing mongodb v 7.0. So far CRUD operations work fine without enabling full v 7.0 compatibility. I won't upgrade fully to 7.0 until we've run over these issues! Also, tests are still not fully operational in mongoo v 7.0. So here we go!


compiler: nim v 2.0

Config:

switch("path", "../src")
switch("define", "filename=/home/testing/shared/anonimongo/tests/15MBfile.bin")
switch("define", "saveas=/home/testing/shared/anonimongo/tests/15MBfile-copy.bin")
switch("define", "user=")
switch("define", "testReplication=yes")
switch("define", "testChangeStreams=yes")
hint("ConvFromXToItSelfNotNeeded", off)
switch("define", "exe=mongod")
switch("define", "dbpath=/var/lib/mongodb")
switch("define", "uri")
switch("define", "nomongod")
--verbosity:0
--define:release
switch("define", "asyncBackend=chronos")

First, I tried uploading a small file around 1 kilobytes, which gave me an IndexDefect error, as part of the tests use 5 megabytes as part of the overall test.

See lines 121-122:

 let fivemb = 5.megabytes
      f.setFilePos fivemb

So, I used a 15.2 MB file. The first error showed the following:

/home/testing/shared/anonimongo/tests/test_gridfs_test.nim(109, 26): Check failed: f.getFileSize == gf.fileSize f.getFileSize was 1966080 gf.fileSize was 15152802

Inspecting the resulting file saveas I verified it had only downloaded 2 MB, confirmed by the following error:

/home/testing/shared/anonimongo/tests/test_gridfs_test.nim(94, 8): Check failed: wr.success Grid download file: reason = Incomplete file download; only at 12.97502600509134% /home/testing/shared/anonimongo/tests/test_gridfs_test.nim(100, 8): Check failed: wr.success Grid download as: reason = Incomplete file download; only at 12.97502600509134%

I checked with pymongo and python's gridfs driver and got the following:

gridfs.errors.CorruptGridFile: truncated chunk #0: expected chunk length to be 1048576 but found chunk with length 131072

Suggesting this was happening from the get go, when uploading.

I tried uploading a 80 MB file, only 32MB got uploaded, but no error arised.

I began monitoring line 122 in gridfs.nim:

let data = waitfor f.read(chunksize) # to make it work regardless sync/async because asyncfile
echo data.len # <- my addition

which output

131072 131072 131072 ... 131072

Even when hardcoding chunks to 261120, the default for GridFS according to MongoDB documentation, it did not work. When I deliberately switched the test to:

    test "Upload file":
      when anoSocketSync:
        wr = grid.uploadFile(filename, chunk = 131072)
      else:
        wr = waitfor grid.uploadFile(filename, chunk = 131072)
      wr.success.reasonedCheck("Grid upload file", wr.reason)

The test ran perfectly. Analyzing further, I hardcoded 1048576 in line 122 in gridfs.nim just to test what was happening:

    let data = waitfor f.read(1048576) # to make it work regardless sync/async because asyncfile
    echo "chunk: " & $chunkSize
    echo data.len

this was the output:

chunk: 1048576 131072 chunk: 1048576

Then I hardcoded this in line 122:

    let data = waitfor f.read(261120) # to make it work regardless sync/async because asyncfile
    echo "chunk: " & $chunkSize
    echo data.len

I got the following for data.len:

131072

edit May this be a bug in the nim std library? or a protection against buffer overflows? It's not, works as expected, see comment below

While I have more time to devote to this error. I'm just pointing out this change I made that made the test work:

    test "Create default bucket":
      require db != nil
      when anoSocketSync:
        grid = db.createBucket(chunkSize = 131072)
      else:
        grid = waitfor db.createBucket(chunkSize = 131072)
      require grid != nil

edit I think this may be a bug in the std library. It's not, see comment below.

We can run more tests. I will continue through this week to find a solution to the issue. There other issues when uploading other files. I tried uploading a .deb file just for kicks. It uploads alright, but overflows in the remaining tests, and fails. I can download it from gridFS using Python without issues, but there's a point in the test this breaks. Once we figure out what's happening with the 131072 value, we can move with the rest of the tests!

This is just one step at debugging this, and I hope it helps.

samsamros commented 9 months ago

Testing async file on its own I had the following:

import std/[asyncfile, asyncdispatch, os]

const target = "path/files"
var f = openAsync(target, fmRead)
let 
    fsize = getFileSize f
    chunkSize = 261120
for _ in countup(0, int(fsize-1), chunkSize):

    let data = waitfor f.read(chunkSize) # to make it work regardless sync/async because asyncfile
    echo data.len

the output:

261120 261120 261120 ... 261120 261120 65985

Switching chunkSize to a megabyte (1048576):

1048576 1048576 1048576 ... 204225

Works as expected.


The following code, run independently from the test works fine:

import std/[asyncfile, asyncdispatch, os]
import anonimongo

const 
    anoSocketSync* = defined(anoSocketSync)
    URI = "mongodb://localhost:27017/"
    saveas = "/home/testing/mongodb-mongosh_1.6.0_amd64.deb"
    target = "/home/testing/Downloads/mongodb-mongosh_1.6.0_amd64.deb"

when not anoSocketSync:
  type TheSock* = AsyncSocket
else:
  type TheSock* = Socket

var mongo = newMongo[AsyncSocket](MongoUri URI)
var grid: GridFS[TheSock]
var db: Database[TheSock]
var wr: WriteResult
let dbname = "newtemptest"

var f = openAsync(target, fmRead)
let 
    fsize = getFileSize f
    chunkSize = 1048576
for _ in countup(0, int(fsize-1), chunkSize):

    let data = waitfor f.read(chunkSize) # to make it work regardless sync/async because asyncfile
    echo data.len

assert waitFor mongo.connect 
db = mongo[dbname]

grid = waitfor db.createBucket(chunkSize = 1.megabytes.int32)

wr = waitfor grid.uploadFile(target)
echo wr.success
wr = waitfor grid.downloadFile(saveas)
echo wr.success

Is there an enforced limit on buffer size in the tests? Checking this out next

samsamros commented 8 months ago

So, I finally made it work. However, there are a few limitations to the test. First, I changed a few check macros to doassert; Secondly, I changed all streams above 131072 to 131072. There appears to be a buffer limitation inside the test template. I cannot confirm it yet, but all tests outside it work fine.

Considering the above limitations, I'd like to propose tests outside the test template to run streams above 131072, such as the 5.megabyte stream that is currently in the gridFS test right now.

I'm attaching a diff file with the changes I made to make the test work, pending all of y'alls review: gridfs.diff.zip

In addition, I'm sharing the diffs here to improve readability and searchability:

diff --git a/tests/test_gridfs_test.nim b/tests/test_gridfs_test.nim
index 112903b..53db33f 100644
--- a/tests/test_gridfs_test.nim
+++ b/tests/test_gridfs_test.nim
@@ -67,9 +67,9 @@ if filename != "" and saveas != "":
     test "Create default bucket":
       require db != nil
       when anoSocketSync:
-        grid = db.createBucket(chunkSize = 1.megabytes.int32)
+        grid = db.createBucket(chunkSize = 131072)
       else:
-        grid = waitfor db.createBucket(chunkSize = 1.megabytes.int32)
+        grid = waitfor db.createBucket(chunkSize = 131072)
       require grid != nil

     test "Upload file":
@@ -106,8 +106,9 @@ if filename != "" and saveas != "":
         var gf = grid.getStream(bson({filename: dwfile}), buffered = true)
       else:
         var gf = waitfor grid.getStream(bson({filename: dwfile}), buffered = true)
-      check f.getFileSize == gf.fileSize
-      check f.getFilePos == gf.getPosition
+      doassert f.getFileSize == gf.fileSize
+      doassert f.getFilePos == gf.getPosition
+

       let threekb = 3.kilobytes
       when anoSocketSync:
@@ -115,28 +116,32 @@ if filename != "" and saveas != "":
       else:
         var binread = waitfor gf.read(threekb)
       var bufread = waitfor f.read(threekb)
-      check bufread == binread
-      check f.getFilePos == gf.getPosition
+      doassert bufread == binread
+      doassert f.getFilePos == gf.getPosition
+      

-      let fivemb = 5.megabytes
-      f.setFilePos fivemb
+      let kbStream = 131072 #changed
+      f.setFilePos kbStream
       when anoSocketSync:
-        gf.setPosition fivemb
+        gf.setPosition kbStream
       else:
-        waitfor gf.setPosition fivemb
-      check f.getFilePos == gf.getPosition
-
+        waitfor gf.setPosition kbStream
+      doassert f.getFilePos == gf.getPosition
+      
+        
       when anoSocketSync:
-        binread = gf.read(fivemb)
+     #it can't be fivemb because of the buffer limit in tests
+        binread = gf.read(kbStream)
       else:
-        binread = waitfor gf.read(fivemb)
-      bufread = waitfor f.read(fivemb)
-      check bufread.len == binread.len
-      check bufread == binread 
-      check f.getFilePos == gf.getPosition
+        binread = waitfor gf.read(kbStream)
+      bufread = waitfor f.read(kbStream)
+      doassert bufread.len == binread.len
+      doassert bufread == binread 
+      doassert f.getFilePos == gf.getPosition

       close f
       close gf
+      

     test "Gridstream read chunked size":
       let chunkfile = "gs_chunks.mkv"
@@ -246,4 +251,4 @@ if filename != "" and saveas != "":
     close mongo
     if runlocal:
       if mongorun != nil and mongorun.running: kill mongorun
-      close mongorun
\ No newline at end of file
+      close mongorun

Considerations:

samsamros commented 8 months ago

@mashingan I've tested this with mongodb v7.0... all tests pass in debian. I will begin testing on a few applications. But I think anonimongo is fully compatible with 7.0. Are you ok with me pushing a branch with the changes I made to the GridFS test or would you like to review the diff file first?

mashingan commented 8 months ago

Please go ahead to proceed. Appreciated for the efforts.