metabase / metabase

The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
https://metabase.com
Other
38.55k stars 5.11k forks source link

Metabase dashboard with Mongodb queries running very slow #36465

Closed happy0088 closed 10 months ago

happy0088 commented 10 months ago

Hello,

We are using Metabase version 0.47.9 with MongoDB as the data source. We have created a dashboard containing two number widgets that retrieve the distinct count from a single row. The generated query by Metabase shows a COLSCAN since the final query has a group by set to None.

Additionally, we have a graph and a table view on the dashboard that performs a group by operation on a column and sums up a couple of columns. However, these queries are slow and result in COLSCAN operations.

We are using MongoDB Atlas, and the resource usage increases significantly, sometimes more than 5 times, when accessing the Metabase dashboard. Many times, the widgets time out, displaying a delta sign with an error.

We are trying to understand why finding the count of unique values in a column takes such a long time. We have also tried enabling the cache with a MAX CACHE ENTRY SIZE of 204800, but this hasn't resolved the issue.

The MongoDB collection size is approximately 28,000 records.

Few of the query patterns

1.[{"$group": {"_id": 1, "count": {"$addToSet": 1}}}, {"$sort": {"_id": 1}}, {"$project": {"_id": 1, "count": {"$size": 1}}}]

2. [{"$group": {"_id": {"PName": 1}, "sum": {"$sum": 1}, "sum_2": {"$sum": 1}}}, {"$sort": {"_id": 1}}, {"$project": {"PName": 1, "_id": 1, "sum": 1, "sum_2": 1}}]

3. [{"$group": {"_id": {"PGroup": 1}, "sum": {"$sum": 1}, "sum_2": {"$sum": 1}}}, {"$sort": {"_id": 1}}, {"$project": {"PGroup": 1, "_id": 1, "sum": 1, "sum_2": 1}}]

Actual query fired on Mongo-
[{'$group': {**'id': None**, 'count': {'$addToSet': '$PName'}}}, {'$sort': {'_id': 1}}, {'$project': {'_id': False, 'count': {'$size': '$count'}}}]

we have created indexes on collections , but its not helping .

To Reproduce Steps to reproduce the behavior:

  1. create dashboard with above mentioned widgets with mongo altas as source .

Expected behavior The dashboard should load faster . The queries created by mongodb should be optimised

Severity this annoying, blocking all the users and blocking our usage of Metabase entirely .

Additional context We are running metabase on EC2 instance and mongodb Altas

Metabase Diagnostic Info

diagnostic info { "browser-info": { "language": "en-GB", "platform": "MacIntel", "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36", "vendor": "Google Inc." }, "system-info": { "file.encoding": "UTF-8", "java.runtime.name": "OpenJDK Runtime Environment", "java.runtime.version": "11.0.21+9", "java.vendor": "Eclipse Adoptium", "java.vendor.url": "https://adoptium.net/", "java.version": "11.0.21", "java.vm.name": "OpenJDK 64-Bit Server VM", "java.vm.version": "11.0.21+9", "os.name": "Linux", "os.version": "4.14.320-242.534.amzn2.x86_64", "user.language": "en", "user.timezone": "GMT" }, "metabase-info": { "databases": [ "postgres", "mongo" ], "hosting-env": "unknown", "application-database": "postgres", "application-database-details": { "database": { "name": "PostgreSQL", "version": "12.14" }, "jdbc-driver": { "name": "PostgreSQL JDBC Driver", "version": "42.5.4" } }, "run-mode": "prod", "version": { "date": "2023-12-01", "tag": "v0.47.9", "branch": "?", "hash": "d05b06e" }, "settings": { "report-timezone": null } } }

paoliniluis commented 10 months ago

Hi @happy0088, you mention that the repro steps are: "create dashboard with above mentioned widgets with mongo altas as source", as you can imagine we don't have the data sources to make those cards. Is there a way you can give us the data sources or provide reproduction steps with the sample data?

otherwise this would be impossible to reproduce

paoliniluis commented 10 months ago

@happy0088 we need details about this

paoliniluis commented 10 months ago

Closing this, as it's not helping to debug the performance issue in MongoDB

happy0088 commented 9 months ago

Hi , Sorry for the delayed response . Below is one of the sample document

{ "_id" : ObjectId("655cXXXXXXXX"), "PName" : "xxxxxxxxx", "PPath" : "xxxxxxxxx", "PId" : xxxxxxxxx, "Url" : "xxxxxxxxx", "SBranch" : "myBranch", "CBranch" : "myBranch", "PId" : 1067249139, "PAT" : "2023-11-09T19:38:07.857Z", "PiAT" : "2023-11-09T19:38:16.474Z", "SURL" : "xxxxxxxxx", "SStatus" : "myStage", "vulnerabilities" : [ ], "criticalCount" : 0, "highCount" : 0, "lowCount" : 0, "mediumCount" : 0, "totalCount" : 0, "unknownCount" : 0, "PGroup" : "myGroup" }

I am using the metabase connector for mongodb which connects to mongoAtlas using connection string .

Let me know if any other details are required .