huridocs / uwazi

Uwazi is a web-based, open-source solution for building and sharing document collections
http://www.uwazi.io
MIT License
241 stars 80 forks source link

Review relationships v2 graph query index usage #5798

Closed LaszloKecskes closed 1 year ago

LaszloKecskes commented 1 year ago

There was a concern that the long, nested aggregation pipelines generated by the relationship v2 graph queries may not using indices after the first match step. The task is to review the query index usage.

LaszloKecskes commented 1 year ago

An example, two hop search generates the following query:

Query
  [{
    $match: {
     $expr: {
      $and: [
       {
        $eq: [
         '$sharedId',
         '4tqw19i6xka'
        ]
       },
       {
        $eq: [
         '$language',
         'en'
        ]
       }
      ]
     }
    }
   }, {
    $addFields: {
     visited: []
    }
   }, {
    $lookup: {
     as: 'traversal-0',
     from: 'relationships',
     'let': {
      sharedId: '$sharedId',
      visited: '$visited'
     },
     pipeline: [
      {
       $match: {
        $expr: {
         $and: [
          {
           $eq: [
            '$$sharedId',
            '$from.entity'
           ]
          },
          {
           $not: [
            {
             $in: [
              '$_id',
              '$$visited'
             ]
            }
           ]
          },
          {
           $in: [
            '$type',
            [
             new ObjectId('6458b1136738b7d98d46b28b')
            ]
           ]
          }
         ]
        }
       }
      },
      {
       $addFields: {
        visited: {
         $concatArrays: [
          '$$visited',
          [
           '$_id'
          ]
         ]
        }
       }
      },
      {
       $lookup: {
        as: 'traversal-0',
        from: 'entities',
        'let': {
         to: '$to',
         visited: '$visited'
        },
        pipeline: [
         {
          $match: {
           $expr: {
            $and: [
             {
              $eq: [
               '$$to.entity',
               '$sharedId'
              ]
             },
             {
              $eq: [
               '$language',
               'en'
              ]
             },
             {
              $in: [
               '$template',
               [
                new ObjectId('5bfbb1a0471dd0fc16ada146')
               ]
              ]
             }
            ]
           }
          }
         },
         {
          $addFields: {
           visited: '$$visited'
          }
         },
         {
          $lookup: {
           as: 'traversal-0',
           from: 'relationships',
           'let': {
            sharedId: '$sharedId',
            visited: '$visited'
           },
           pipeline: [
            {
             $match: {
              $expr: {
               $and: [
                {
                 $eq: [
                  '$$sharedId',
                  '$from.entity'
                 ]
                },
                {
                 $not: [
                  {
                   $in: [
                    '$_id',
                    '$$visited'
                   ]
                  }
                 ]
                },
                {
                 $in: [
                  '$type',
                  [
                   new ObjectId('6458b12d6738b7d98d46c111')
                  ]
                 ]
                }
               ]
              }
             }
            },
            {
             $addFields: {
              visited: {
               $concatArrays: [
                '$$visited',
                [
                 '$_id'
                ]
               ]
              }
             }
            },
            {
             $lookup: {
              as: 'traversal-0',
              from: 'entities',
              'let': {
               to: '$to',
               visited: '$visited'
              },
              pipeline: [
               {
                $match: {
                 $expr: {
                  $and: [
                   {
                    $eq: [
                     '$$to.entity',
                     '$sharedId'
                    ]
                   },
                   {
                    $eq: [
                     '$language',
                     'en'
                    ]
                   },
                   {
                    $in: [
                     '$template',
                     [
                      new ObjectId('6458ae816738b7d98d46ab06')
                     ]
                    ]
                   }
                  ]
                 }
                }
               },
               {
                $addFields: {
                 visited: '$$visited'
                }
               },
               {
                $unset: [
                 'visited'
                ]
               },
               {
                $unset: 'traversal'
               }
              ]
             }
            },
            {
             $set: {
              traversal: {
               $concatArrays: [
                '$traversal-0'
               ]
              }
             }
            },
            {
             $unset: [
              'traversal-0',
              'visited'
             ]
            },
            {
             $project: {
              type: 1,
              traversal: 1
             }
            },
            {
             $unwind: '$traversal'
            }
           ]
          }
         },
         {
          $set: {
           traversal: {
            $concatArrays: [
             '$traversal-0'
            ]
           }
          }
         },
         {
          $unset: [
           'traversal-0',
           'visited'
          ]
         },
         {
          $project: {
           sharedId: 1,
           title: 1,
           traversal: 1
          }
         },
         {
          $unwind: '$traversal'
         }
        ]
       }
      },
      {
       $set: {
        traversal: {
         $concatArrays: [
          '$traversal-0'
         ]
        }
       }
      },
      {
       $unset: [
        'traversal-0',
        'visited'
       ]
      },
      {
       $project: {
        type: 1,
        traversal: 1
       }
      },
      {
       $unwind: '$traversal'
      }
     ]
    }
   }, {
    $set: {
     traversal: {
      $concatArrays: [
       '$traversal-0'
      ]
     }
    }
   }, {
    $unset: [
     'traversal-0',
     'visited'
    ]
   }, {
    $project: {
     sharedId: 1,
     title: 1,
     traversal: 1
    }
   }, {
    $unwind: '$traversal'
   }]
  

There were three attempts at analyzing the query index usages.

First, we were trying to run MongoDB's explain functionality in-code, the same place the query is generated. It turns out that explains are not available in multi-document transactions, which our new transaction system now uses (see more at the documentation).

As the second try, we have lifted out the query into a separate script, and run the explain command there. Explain results were only available for the first stage of the pipeline, but not for further stages. On further research, it might be because explain results for lookup stages are only available since MongoDB 5.0 (documentation).

Finally, we have imported the pipeline into Mongo Compass. Compass is not able to provide more explain functionality, since it is just parsing the usual explain result, but it has a convenient UI for checking index usage statistics per collection. Through that, we can confirm that an isolated execution of the query properly increments the indices responsible for the nested lookups.

While this does not alleviate the need for stress testing when we reach a more mature stage, the result is positive and no further actions are needed for now.