jeroen / mongolite

Fast and Simple MongoDB Client for R
https://jeroen.github.io/mongolite/
284 stars 64 forks source link

The handler parameter from find() is unclear or might need more documentation #236

Open ColinFay opened 2 years ago

ColinFay commented 2 years ago

It's unclear how find(handler = ) works.

The documentation says:

Retrieve fields from records matching query. Default handler will return all data as a single dataframe.

Which seems to imply that defining another handler would allow to return something else than a single dataframe.

Digging into the code, mongo_stream_in does the following:

   cb <- if (is.null(handler)) {
        out <- new.env()
        function(x) {
            if (length(x)) {
                count <<- count + length(x)
                out[[as.character(count)]] <<- x
            }
        }
    }
    else {
        function(x) {
            handler(post_process(x))
            count <<- count + length(x)
        }
    }

meaning that handler() is always called with post_process(), which does:

> mongolite:::post_process
function (x) 
{
    df <- as.data.frame(jsonlite:::simplify(x))
    df
}

So it will always be a dataframe or a list of dataframe?

That raises the following questions:

the end of mongo_stream_in is

    if (is.null(handler)) {
        if (verbose) 
            cat("\r Imported", count, "records. Simplifying into dataframe...\n")
        out <- as.list(out, sorted = FALSE)
        post_process(unlist(out[order(as.numeric(names(out)))], 
            FALSE, FALSE))
    }
    else {
        invisible()
    }

so if the handler is defined, nothing is returned?

My use case is the following :

con <- mongolite::mongo()
con$drop()
con$insert(
    list( y = data.frame(x = 1) )
)
# I want this to return list( y = data.frame(x = 1) )
con$find() 
con$insert(
    list( z = data.frame(x = 1) )
)
# I don't want this to return a data.frame
con$find()

To sum up, I don't want the automatic dataframe conversion. How do we achieve that in find?

jeroen commented 2 years ago

I think find() assumes data frames indeed. If you don't want that, you may want to try iterate(): https://jeroen.github.io/mongolite/query-data.html#iterating

ColinFay commented 2 years ago

Thanks a lot, that definitely matches what I wanted to do :)

I stil feel like the doc is a little bit unclear about what/how handler works, so do you want to keep this issue open?