yahoo / elide

Elide is a Java library that lets you stand up a GraphQL/JSON-API web service with minimal effort.
https://elide.io
Other
1k stars 227 forks source link

Large page sizes cause high memory consumption with inefficient filtering #385

Closed micheljung closed 7 years ago

micheljung commented 7 years ago

playerAchievements contain 335'000 entries. player contains 155000 entries.

This query is fast and memory efficient (45 results): /data/playerAchievement?filter[playerAchievement.player.id]=1337&page[size]=10000

This query is slow and consumes a lot of memory: /data/playerAchievement?filter[player.id]=1337&page[size]=10000

I don't know how Elide's filtering exactly works and there's obviously a difference in how the above queries are processed, but I wished that either the slow query is invalid or that it behaves the same as the fast one. Otherwise it's pretty easy for users to overload the API with such inefficient queries.

DennisMcWherter commented 7 years ago

Hi @micheljung, thanks for reporting this.

I noticed in your other issue you mentioned that you're using hibernate. Do you have any Eager collections being loaded or any other potentially expensive relationships in your PlayerAchievement entity? For hibernate stores, filtering is actually pushed directly down to hibernate so the query can be performed in the datastore rather than in-memory via Elide.

Also, enabling the org.hibernate.SQL logger as DEBUG in your log config will help investigate the precise queries being sent down to your datastore.

If you wouldn't mind could you post some of that output?

DennisMcWherter commented 7 years ago

After looking a little more closely at your query, this is really an ugly subtlety with how this filtering scheme behaves. If you don't provide the root-level type, there are some global properties about how this filtered is applied (i.e. check if anything has a player object and if so, filter on the provided condition).

The first query is more efficient because it's specifically saying "only look at the playerAchievement types."

We are only supporting this type of filtering for backwards compatibility at this point. Generally speaking, we're more in favor of using RSQL as it's a more well thought-out spec and provides a clearer and more powerful filtering syntax.

As far as abuse is concerned, Elide generally punts on this issue. We recommend that you implement your own rate-limiting solution as necessary. The reason being is that it's not possible for Elide to know what are acceptable latency bounds for any particular query for all use cases. In fear of feature creep, I suspect we don't want to add rate-limiting as part of Elide. Depending on various factors (most naturally, data filtered by the user executing the request), an identical Elide query could produce vastly different runtimes.

For instance, imagine a simple system where a "mere mortal" user is accessing their data by hitting /playerAchievements. With a security filter, you can cut this down to only display the achievements belonging to the player who is logged in. On the other hand, consider the exact same query run by a "super user" who can view all things in the system. In this case, /playerAchievements (the same Elide query) will issue a slightly different query to the database and will very likely be much slower.

Do you have thoughts on what you would prefer to see here? I see your other ticket and will start looking at that now, but as I mentioned, our recommendation moving forward will generally be to use RSQL rather than the legacy filtering syntax. Moreover, as we make our move to releasing Elide 3.0 (one final feature currently in development before we officially cut our release), we will also be revamping our docs to be clearer on these issues.

Thanks!

micheljung commented 7 years ago

Hi @DeathByTape and thank you for your extensive response! Was using RSQL until today because, well, it's not "basic" even though I've yet to figure out how it is "more powerful" :-) But because of #384 I switched to basic (hopefully only temporarily :-)).

I don't feel competent enough to have an opinion or recommendations here, I just stumbled upon it and felt it's a bit non-intuitive and "dangerous" because it happens so easily. The security filter isn't really an option since you're allowed to see achievements of other players as well. I understand that it's not Elide's responsibility to prevent abuse, but I thought this kind of query could be made more efficient, maybe? I don't understand its details yet (I'm not even trying to at this point x)), so maybe that's just its nature.

I'm not sure whether using RSQL will solve this? I'll go back to RSQL as soon as #384 is fixed though :-)

Feel free to close this ticket if there's nothing to do about it, just wanted to at least let you know :)

Cheers

PS: looking forward to Elide 3.0!

DennisMcWherter commented 7 years ago

Sure, I'll keep this open until we get the fix ready for #384 :) Generally speaking, this is really only a back-compat issue with the "basic" scheme :( so you shouldn't see this in RSQL. If we were able to reproduce this in RSQL, then there would be deeper issues to explore there!