Optimize queries - Githubissues

Is your feature request related to a problem? Please describe. Once https://github.com/ag-gipp/NLP-Land-backend/issues/24 is solved some queries might not perform well anymore. The /info, /quartiles, and topk endpoints take very long to respond for authors (around 1 minute). The issue comes from MongoDB has to:

$unwind 5 million papers, into 10+ million items
$group those into 2.7 million group
$sort those 2.7 million authors (without index)

Describe the solution you'd like Optimize all queries that do not perform well and fix any workarounds.

[X] Fix the paged endpoint: It has issues with the $lookup/$sort/$project in the pipeline, so we changed the order of the pipeline as a workaround. This returns incorrect results when we sort by venue or authors. Originally the $project stage was before the $sort stage. done, but there is a new issue
[x] Change the schema, so all information is duplicated into each author. This will make sure all filters can be applied to each author and without $unwind/$group or $lookup.

Describe alternatives you've considered Should the queries without filters still take too long we could add some default values for filters.

jpwahle / cs-insights-backend

Optimize queries #25