miLibris / flask-rest-jsonapi

Flask extension to build REST APIs around JSONAPI 1.0 specification.
http://flask-rest-jsonapi.readthedocs.io
MIT License
597 stars 153 forks source link

Preventing useless queries when listing entities #187

Closed mikael-s closed 4 years ago

mikael-s commented 4 years ago

Hello,

On an application with about 100k entries, listing them takes minutes. This surprised me because listing entries should be fairly quick, even if there are many of them. It appears that for each entry it produces a query to each relationship. This makes a huge number of queries. To understand if I did something wrong, I started from your own example in the documentation. I created 100 computers and 100 persons, related to a computer.

Then I listed all the computers (with /computers?page[size]=0) and I asked SQLAlchemy to log every query. This confirmed that I had one SELECT on the computer table and as many SELECT on the person table as there are owner of a computer. For instance, one of them:

INFO:sqlalchemy.engine.base.Engine:SELECT person.id AS person_id, person.name AS person_name, person.email AS person_email, person.birth_date AS person_birth_date, person.password AS person_password 
FROM person 
WHERE person.id = ?
INFO:sqlalchemy.engine.base.Engine:(19,)

First: why is this query necessary? I mean the listing doesn't provide the detail of the person, so why retrieving this data? How could we prevent Flask-REST-JSONAPI from retrieving it?

Second: if this query is necessary, why don't you have a join?

Third: can I prevent this from happening to prevent huge efficiency losses?

Thanks a lot!

multimeric commented 4 years ago

Have you tried using the include parameter? This should enable the eagerloading behaviour. Something like /computers?include=owner.

mikael-s commented 4 years ago

Thanks for your help! However I guess I didn't make my point very clear because I'm not looking for a more complete output. This will let me the opportunity to clarify.

If I want to list all the computers, the API should perform a single SQL query. Something like SELECT * FROM computers. But the API is actually doing much more work (for nothing apparently). It's doing that first query and then SELECT * FROM persons WHERE id = X (with one query for each possible value of X!).

However the /computers route doesn't give any info on the persons, so why bothering doing all those queries? Even if we wanted that information, a single SQL query would still be enough. Something like SELECT * FROM persons, computers WHERE computers.owner = persons.id.

Why is the API doing so many queries and how can I prevent it to do so? Currently it's a huge issue as it takes minutes to retrieve data from the API while it could be done in a few seconds (at most).

iamareebjamal commented 4 years ago

You are describing this issue - https://github.com/marshmallow-code/marshmallow-jsonapi/issues/277 which is fixed upstream - https://github.com/marshmallow-code/marshmallow-jsonapi/pull/278

mikael-s commented 4 years ago

Awesome! Thanks a lot.