API limit restrict output to user, but doesn't restrict amount of data from mongodb

kernelci / kernelci-api

KernelCI API - Database - Pub/Sub

GNU Lesser General Public License v2.1

9 stars 17 forks source link

API limit restrict output to user, but doesn't restrict amount of data from mongodb #549

Open nuclearcat opened 3 hours ago

nuclearcat commented 3 hours ago

If we do two queries: curl "https://staging.kernelci.org:9000/latest/nodes?kind=test&limit=1" real 0m5.994s user 0m0.012s sys 0m0.000s

curl "https://staging.kernelci.org:9000/latest/nodes?kind=checkout&limit=1" real 0m0.023s user 0m0.012s sys 0m0.000s

We can notice significant difference in response time. While it should be fast both (kind is indexed).

On preliminary investigation reason is that search query submitted to mongodb without limit parameter, which is causing very large output in case of some queries. We need to resolve that, as it is also causing excessive load on database and ram consumption.

JenySadadia commented 2 hours ago

Based on an initial investigation, API uses fastapi-pagination to query the database and it does use limit and offset parameters while querying DB. Please see https://github.com/kernelci/kernelci-api/blob/main/api/db.py#L149. I think the reason could be kind=checkout is introduced later in the development whereas kind=test nodes are in maestro from the very start. That's why it would be a huge difference in terms of data of both these kinds in DB. Hence, the response time difference is there in the queries.