jpwahle / cs-insights-backend

API server of the cs-insights project. This is the main part of storing data and accessing an external data analysis endpoint. It uses a mongoDB instance to store everything and queries the cs-insights-prediction-endpoint to get machine learning results.
https://jpwahle.github.io/cs-insights-backend/
MIT License
7 stars 0 forks source link

Optimize queries #25

Closed trannel closed 2 years ago

trannel commented 2 years ago

Is your feature request related to a problem? Please describe. Once https://github.com/ag-gipp/NLP-Land-backend/issues/24 is solved some queries might not perform well anymore. The /info, /quartiles, and topk endpoints take very long to respond for authors (around 1 minute). The issue comes from MongoDB has to:

  1. $unwind 5 million papers, into 10+ million items
  2. $group those into 2.7 million group
  3. $sort those 2.7 million authors (without index)

Describe the solution you'd like Optimize all queries that do not perform well and fix any workarounds.

Describe alternatives you've considered Should the queries without filters still take too long we could add some default values for filters.

jpwahle commented 2 years ago

It will be fixed in database Schema remodeling https://github.com/gipplab/cs-insights-backend/issues/90