rueckstiess / mtools

A collection of scripts to set up MongoDB test environments and parse and visualize MongoDB log files.
Apache License 2.0
1.88k stars 397 forks source link

mloginfo: Improving mloginfo to include displaying pattern for aggregate operation #861

Open PrasannaSM opened 2 years ago

PrasannaSM commented 2 years ago

Running mloginfo on a mongo log file with --queries option returns None as pattern for aggregate operation. The property definition is as follows

    @property
    def pattern(self):
        """Extract query pattern from operations."""
        if not self._pattern:

            # trigger evaluation of operation
            if (self.operation in ['query', 'getmore', 'update', 'remove'] or
                    self.command in ['count', 'findandmodify']):
                self._pattern = self._find_pattern('query: ')
                # Fallback check for q: variation (eg "remove" command in 3.6+)
                if self._pattern is None:
                    self._pattern = self._find_pattern('q: ')
            elif self.command == 'find':
                self._pattern = self._find_pattern('filter: ')
        return self._pattern

There is no case for handling aggregate command in the above snippet. This behavior of mloginfo restricts the context of having a common place where the complete summary (in table form) would be available.

Expected behavior

namespace                  operation    pattern         count    min (ms)    max (ms)    95%-ile (ms)    sum (ms)    mean (ms)    allowDiskUse
test_db.test_coll1         find        {"field1": 1, "field2": 1, "field3": 1, "field4": 1, "field5": 1}          1         470         470           470.0         470        470.0    None
test_db.test_coll2         aggregate         [{"$match": {"field1": 1}}, {"$unwind": 1}, {"$match": {"field2": {"$ne": 1}, "field3": 1, "field4": 1}}, {"$group": {"Count": {"$sum": 1}, "_id": 1}}]         1         252         252           252.0         252        252.0    None

Actual/current behavior

namespace                  operation    pattern         count    min (ms)    max (ms)    95%-ile (ms)    sum (ms)    mean (ms)    allowDiskUse
test_db.test_coll1         find         {"field1": 1, "field2": 1, "field3": 1, "field4": 1, "field5": 1}         1         470         470           470.0         470        470.0    None
test_db.test_coll2         aggregate         None         1         252         252           252.0         252        252.0    None
stennie commented 2 years ago

Hi @PrasannaSM,

I looked into this previously and unfortunately logged aggregation pipelines didn't seem well suited for a concise summary of query patterns being executed per https://github.com/rueckstiess/mtools/issues/338#issuecomment-568435401. This comment also includes some suggestions on how to investigate slow aggregation queries.

It was intentional to use None for the aggregation pattern as output becomes extremely difficult to reduce & read with longer aggregations.

Aside from index usage in initial pipeline stages that fetch data, most of the processing time for an aggregation pipeline will typically be spent on data manipulation rather than queries.

Regards, Stennie

PrasannaSM commented 2 years ago

Thanks @stennie

I get where you're coming from. In that case, can't we provide arg support to show aggregate pattern mloginfo mongo.log --queries --show-aggregate-pattern

Only if --show-aggregate-pattern is provided, we would display pattern. otherwise, it will be morphed as None (current behavior)

Readability issue can be addressed if user can write it to a file instead of viewing a table