jeroen / mongolite

Fast and Simple MongoDB Client for R
https://jeroen.github.io/mongolite/
286 stars 65 forks source link

Syntax for mongolite R package with group on a $year field #150

Closed jeemer32 closed 6 years ago

jeemer32 commented 6 years ago

Hello Community,

I have an Issue, that I would like to discuss with you.

I have data in the following structure:

{ 
    "_id" : "6489", 
    "post_date" : ISODate("2014-12-20T17:00:00.000+0000"), 
    "post_modified" : ISODate("2014-06-28T14:33:57.000+0000")
}

I group the number of IDs by year. The following aggregation code works in the shell and gives me the correct result:

db.mongodb.aggregate(
    [

          { "$group" : { 
                "_id" : { "post_date" :{ $year: "$post_date" } }, 
                "COUNT(_id)" : { "$sum" : NumberInt(1) } 
                } 
          },
        { 
            "$project" : {
                "_id" : NumberInt(0), 
                "COUNT(_id)" : "$COUNT(_id)", 
                "post_date" : "$_id.post_date"
            }
        }, 

        { 
            "$sort" : {
                "post_date" : NumberInt(-1)
            }
        },
     {$limit: 11}
    ]
);

But in R with Mongolite I can't complete the aggregation. I don't know exactly how to place the $year. This is my current code, which doesn't work yet:

 aggregation <- mongodb_connection$aggregate(
  '[
  {"$group" : {"_id":"$post_date":{"$year": "$post_date"},"count": {"$sum":1}}},
  {"$sort": {"post_date":-1}},
  {"$limit":11}]')

Does anyone have an idea or find the bug in my code? Is that possible?

jeroen commented 6 years ago

I think you need to use "mongo extended json", see chapter 3.6 for how to select a date: https://jeroen.github.io/mongolite/

jeemer32 commented 6 years ago

Thank you for your answer, @jeroen . Actually I don't want to select a date, but extract the year from the timestamp and group the ID's by year. At the end I want to have a table in which I can see the number of id's (entries) per year.

The working part of the code is as follows and gives me the following output:

aggregation <- mongodb_connection$aggregate(
  '[
  {"$group" : {"_id":"$post_date","count": {"$sum":1} }},
  {"$sort": {"post_date":-1}},
  {"$limit":11}]')

Output:

post_date | count(_id)
-- | --
02.05.2018   03:05 | 1
02.05.2018   11:14 | 1
02.05.2018   10:02 | 1
02.05.2018   10:31 | 8

Now I do not want a grouping according to the entire timestamp, but only according to year. This is what the table should look like:

post_date | Count(_id)
-- | --
2018 | 1402
2017 | 4996
2016 | 7765
2015 | 5653

For that I have to change $group aggregation in Shell to this and that worked:

{ "$group" : { 
                "_id" : { "post_date" :{ $year: "$post_date" } }, 
                "COUNT(_id)" : { "$sum" : NumberInt(1) } 
                } 
          },

My problem is that I don't know exactly how to use the $year in Mongolite in the $group pipeline to get the same result as in Shell.

jeroen commented 6 years ago

I think you just need to wrap "$year" in double quotes? This seems to work for me:

mongodb_connection$aggregate('[
  {"$group" : {"_id":{ "post_date" :{ "$year": "$post_date" } },"count": {"$sum":1} }},
  {"$sort": {"post_date":-1}},
  {"$limit":11}]')

Only the sort doesn't seem to work 🤔

jeemer32 commented 6 years ago

That worked wonderfully. Thank you very much @jeroen.