jeroen / mongolite

Fast and Simple MongoDB Client for R
https://jeroen.github.io/mongolite/
284 stars 64 forks source link

Iterator causes recursive gc invocation when called from RScript #259

Closed koheiw closed 6 months ago

koheiw commented 6 months ago

mongolite crashes on my Linux when executed using RScript (it works when I run interactively in the R console).

*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation

Traceback:
 1: mongo_cursor_next_page(cur, size = 1)
 2: it$one()
 3: eval(ei, envir)
 4: eval(ei, envir)
 5: withVisible(eval(ei, envir))
 6: source("export.R", local = TRUE, echo = FALSE)

The error is triggered by the iterator. I am using iterator to get a list (https://github.com/jeroen/mongolite/issues/236).

      query <- sprintf('{
          "type": "%s",
          "date": {"$gte": {"$date": "%sT00:00:00Z"}, 
                   "$lte": {"$date": "%sT23:59:59Z"}}
      }', type, from, to)
      res <- con$find(query, fields = '{"guid": 1}')

      lis <- rep(list(NULL), nrow(res))
      names(lis) <- res[["_id"]]
      for (oid in names(lis)) { 
          it <- con$iterate(sprintf('{"_id": {"$oid": "%s"}}', oid),
                            fields = '{"date": 1, "cik": 1, "text": 1, "section": 1}')
          doc <- it$one() # ERROR 
          if (!is.null(doc))
              lis[[oid]] <- unlist_mongo(doc)
      }
Rscript -e "packageVersion('mongolite')"
[1] ‘2.7.3’
koheiw commented 6 months ago

I noticed that this happens when I call mongo() within future.apply::future_lapply(). This might be an issue in future.apply's parallelization infrastructure (e.g. dead child processes) instead of mongolite.

P.S. I want to know the best practice in establishing multiple connections to MongoDB from R.

jeroen commented 6 months ago

You don't need to do anything special to establish multiple connections, the driver is specifically designed to handle this. You can keep multiple database connections by calling mongo() several times, and let the driver handle the pooling.

However I don't think it is a good idea to do multiple database queries at the same time. But if you really want to do this, you probably need to make sure that you create and close the connection in the worker. Copying connections from the parent to the worker is probably a bad idea (perhaps that was the original cause of your troubles).