mrevutskyi / flask-restless-ng

A Flask extension for creating simple ReSTful JSON APIs from SQLAlchemy models.
https://flask-restless-ng.readthedocs.io
Other
64 stars 11 forks source link

[Question] Optimize GET request with relationship? #18

Closed sharky98 closed 3 years ago

sharky98 commented 3 years ago

Hello,

I don't know if it's internal limitation of either Flask-SQLAlchemy, SQLAlchemy itself or Flask-Restless-NG or the database backend I am using (SQLite)... or just some possible improvment.

I am querying a lot of GET endpoint with a lot of includes. When doing debugging, it seems to be doing one SELECT statement for each possible includes. I am still new to Python and those librairies, but I remember that on both .NET Entity Framework and PHP Doctrine, when a multi-relationship query was made (even for many-to-many), it was limited to a very few queries (like 1 for the model itself and all the many-to-one relationships and one per many-to-many; something like that).

Is this something that could be possible with this API?

Contrary to my previous ticket, this one is beyond my limits with this environment.

Thanks!

sharky98 commented 3 years ago

I think you code do something like that (in pseudo code, foreach I think are recursive function)

includedModels = [modelA, modelB, ...] // The list of model to be included in the GET request
models = SELECT * FROM model
resultingIncludedModel = []
FOREACH model IN models
  FOREACH includedModel IN includedModels
    resultingIncludedModel += SELECT * FROM includedModel WHERE relation_id = model.id

So basically, looping over the base model to push a new included result if one is found.

If the downstream library allows it, maybe you could reduce the number of queries by using the where "IN" SQL statement, which is basically returning all included that match a list of ids.

includedModels = [modelA, modelB, ...] // The list of model to be included in the GET request
models = SELECT * FROM model
foundModelIds = []
resultingIncludedModel = []

FOREACH model IN models
  foundModelIds += model.id // retrieve a simple array of the model ID

FOREACH includedModel IN includedModels
  resultingIncludedModel += SELECT * FROM includedModel WHERE relation_id IN foundModelIds

PS: In pseudo code, read += on an array as pushing new entry into array

mrevutskyi commented 3 years ago

Yes, currently it depends on SQLAlchemy's Relationship definition: if it is lazy='dynamic' then it will emit a new SELECT statement per relationship. In the past, if it is lazy='joined' - it will emit a joined query (though even is that relationship was not requested)

I remember having an experimental build (before I created a fork from the original project) where I was adding .join() based on included fields in the request. I'll take a look at and try to make some improvements in the next couple of weeks

mrevutskyi commented 3 years ago

Hi, I've did some refactoring for code that is responsible for fetching collections. It is currently available as a beta version, you can install it as pip install Flask-Restless-NG==2.0.0b0 Please try it out and let me know if you find any issues/improvements

mrevutskyi commented 3 years ago

version 2.0.1 has been released. Please let me know if you still having issues with it

sharky98 commented 3 years ago

Thanks! I was busy with my real tasks at job, didn't update my tool for a while, I'll get back to you as soon as I can put some time into it!