mongodb / laravel-mongodb

A MongoDB based Eloquent model and Query builder for Laravel (Moloquent)
https://www.mongodb.com/docs/drivers/php/laravel-mongodb/
MIT License
7k stars 1.43k forks source link

Order of projection in pipeline stages affecting final query #2339

Open fernando2amigos opened 2 years ago

fernando2amigos commented 2 years ago

Description:

The order of stages in the resulting "aggregate" pipeline alters the final result

Steps to reproduce

Make any Laravel model an store the sample document in any collection Dump the executed queries. (dump(DB::connection('mongodb')->getQueryLog());) Observe difference between the right query returning results and the one generated by the package

Expected behaviour

To not assume the "project" to be the last stage in the pipeline

Actual behaviour

It's setting the "project" stage as the last stage in the aggregation pipeline

This is a sample document in my collection:

{ _id: ObjectId("61eb0466d0b35355633c37b5"),
  entityType: 'organization',
  eventId: '004a30cd-b809-44d4-XXXX-4f865e290000',
  organizerId: 'ecd480ce-9918-YYYY-8126-4bafa5520000',
  accountId: 'f95746ab-adf8-4917-ZZZZ-b4beee780000',
  name: 'My Cool Company LLC.',
  totalVisits: 32,
  visitors: 
   [ { id: '3262edf6-1a66-45ef-0000-e7d2f7fb4d2a',
       name: { firstName: 'John', lastName: 'Doe' } },
     { id: 'f65423ad-b919-4433-XXXX-f59f9770c8d0',
       name: { firstName: 'JANE', lastName: 'DOE' } },
     { id: 'e19f6f58-b3fc-4ebb-ZZZZ-fe92313875d1',
       name: { firstName: 'Neil', lastName: 'Young' } },
     { id: 'c18d4808-b398-41c5-XXXX-e5fba4fc0000',
       name: { firstName: 'Ariel', lastName: 'Ortega' } } ],
  hasLeft: false,
  createdAt: 2021-08-24T20:50:52.000Z }

What I need to do is to get the SUM of the items in each "visitors" array of every matching document, so if the only matching document would be this one, then the result of the aggregation would be 4p.

I've manually written this query which yields the expected results:

db.registrations.aggregate([
  {
    "$match":{
      "entityType":"organization"
    }
  },{
    "$project":{
      "uniqueCount":{
        "$size":{
          "$cond":[{"$isArray":"$visitors"},"$visitors",[]]
        }
      }
    }
  },
  {
    "$group":{
      "aggregate":{
        "$sum":"$uniqueCount"
      },
      "_id":null
    }
  },
])

And this is the one written by the package: (not the swapped places between $group and $project)

db.registrations.aggregate([
  {
    "$match":{
      "entityType":"organization"
    }
  },
  {
    "$group":{
      "aggregate":{
        "$sum":"$uniqueCount"
      },
      "_id":null
    }
  },
  {
    "$project":{
      "uniqueCount":{
        "$size":{
          "$cond":[{"$isArray":"$visitors"},"$visitors",[]]
        }
      }
    }
  }
  ],
{"typeMap":{"root":"array","document":"array"}}
)

This is the Model's code:

OrganizationRegistrationTracker::
    where('entityType', 'organization')
    ->project([
        'uniqueCount' => [
            '$size' => [
                '$cond' => [
                    [ '$isArray' => '$visitors' ],
                    '$visitors',
                    []
                ]
            ]
        ]
    ])
    ->sum('uniqueCount');

Maybe I'm missing something. In that case, let me know and please add it to documentation.

Thanks in advance!

fernando2amigos commented 2 years ago

Just as a proof of concept, I've altered the order the pipeline is built in Builder.php and I got the expected results. image

fernando2amigos commented 2 years ago

Hey guys, I wanted to make a change and contribute but beside cloning the repo and building the containers I couldn't make the "tests" container run nor install dependencies. I know this isn't the correct place to discuss this, but the CONTRIBUTING file hasn't any technical directions onto this matter. If you could give some help I'd be glad to drop my two cents.

zhouyanqin commented 1 year ago

I've run into the same problem like you. However, I find a solution by using raw() function. Here is an example: $cursor = DB::connection('mongodb')->table('documents')->raw(function ($collection) use ($type, $by) { return $collection->aggregate([ [ '$match' => [ '$and' => [ ['deleted_at' => null], ], ], ], [ '$group' => [ '_id' => [ '$dateToString' => [ 'format' => '%Y-%m', 'date' => '$created_at', ], ], 'count' => ['$sum' => 1], ], ], [ '$sort' => [ 'count' => -1, ], ], [ '$limit' => 12, ], ]); }); } You can also use DB::connection('mongodb')->raw() to get a mongodb connection directly. Hope this helps.

PS: I don't think changing the original code would be a good idea, because in most situation, the original lib works just fine, and what we need here is really rare.

Don't know how to get the code block working correctly, sorry.

fernando2amigos commented 1 year ago

Hey, I'm not that anymore in that project, but yes I ended up building methods that constructed the array pipelines and using a ton of raw() queries.

Regarding the code block, you should use three back-ticks to open a whole block and not just formatting inline (backtick)(backtick)(backtick)php ... (backtick)(backtick)(backtick)

 This is 
     and indented {
         [piece of => code]
     }