jeroen / mongolite

Fast and Simple MongoDB Client for R
https://jeroen.github.io/mongolite/
286 stars 65 forks source link

Insert issue with numeric rownames #111

Closed jcaude closed 6 years ago

jcaude commented 6 years ago

I use the field '_row' as my main query key. It works pretty well when dealing with alphanum values. But when values are integer it doesn't do what I expect (even if I convert them to character). For example:

m$insert(data.frame(.value="A", row.names="100")) m$find()

will return the following data-frame

_value 1 A

JSON content is the collection is:

{ "_id" : ObjectId("59ce0e5c21684a943776335e"), "_value" : "A" }

But if I insert a list instead of a data-frame, it works as expected:

m$insert(list(.value="A", .row="101"))

gives

{ "_id" : ObjectId("59ce0fa721684a943776335f"), "_value" : [ "A" ], "_row" : [ "101" ] }

But in many cases I do have a mix of list/data-frames insertions. The workaround is to manually set the _row value with an update query. For example in the first case I can do:

m$update(query = '{"_id" : {"$oid":"59ce0e5c21684a943776335e"}}',update = '{"$set": {"_row":"101"}}')

I consider this as an issue.. well, at least for me.

Jean-Christophe.

btw: you did a great job, mongolite is very easy to use.. THX!

jeroen commented 6 years ago

You shouldn't store data in the data frame row names, they are only for internal use in R.

Why can't you use data.frame(.value="A", _row="100")? Or anything else but row.names?

jcaude commented 6 years ago

Well, row names are not only internal stuff since you can access to them and use them for subsetting which is a very important. In my particular case I wrote some kind of "generic hash bag" code for my users with memory, db... backends. My point of view is very agnostic in this case, I don't want to deal with the bag content. But that's not a big deal since there is a workaround. The point I was raising was some kind of inconsistency, if row.names are set as characters they are properly stored, but not if they are set as numeric values (ie not a 1:n sequence).

jeroen commented 6 years ago

You may be able to force the rownames using m$insert(x, rownames = TRUE). Can you try this?

I still much recommend you use an actual column instead of the row names. You can filter on any column just as easily, or do more advanced operations.

jeroen commented 6 years ago

Closing this issue due to inactivity.

To summarize: the default behavior of toJSON is to only includes the rownames in the output when they are not just numbers, but actual names. You can force inserting row names with m$insert(x, rownames = TRUE) if you want to always write row names to the collection.

Obviously mongo collections don't have row names, so they'll be stored as a regular field.