loopbackio / loopback-connector

Building blocks for LoopBack connectors
Other
35 stars 99 forks source link

Forced order by ID in sql.js causing 40+ second queries for single row queries. #169

Open SippieCup opened 4 years ago

SippieCup commented 4 years ago

On tables with 100 million+ rows, when querying for a single row using loopback it'll insert an "order by id" forcing the entire table read.

Steps to reproduce

I am using a postgres server, but it is for pretty much all sql queries.

Example: We have a model "foo" attached to a SQL table with the columns id and hash, where id is unique, and hash is not unique. This table has 100,000,000 Rows. and some hashes have up to 100,000 rows. (note this is simplified, in my database there are far more columns and every row is unique)

We use this curl command to query the loopback API (I know that I can use findOne instead, but it'll still go toprototype.all so it really doesn't make a difference besides returning a single object rather than an array):

curl --location --request GET 'localhost:3000/api/foos' --header 'Content-Type: application/x-www-form-urlencoded' --form 'filter={"where": { "hash": "d41d8cd98f00b204e9800998ecf8427e" }, "limit": 1 }'

Current Behavior

Loopback generates the following SQL statement

SELECT * FROM "public"."foo" WHERE "hash"='d41d8cd98f00b204e9800998ecf8427e' ORDER BY "id" LIMIT 1

Because hash is non-unique, and can have thousands of rows, (in this case, 115,000) this takes 40.632s on my server to complete.

Expected Behavior

Loopback should instead generate the following SQL statement:

SELECT hash FROM "public"."foo" WHERE "hash"='d41d8cd98f00b204e9800998ecf8427e' LIMIT 1

resulting in a query time of on my server of 12ms.

Link to reproduction sandbox

If you really need me to setup a reproduction sandbox I can, but the problem should be pretty apparent.

Additional information

link to offending code: https://github.com/strongloop/loopback-connector/blob/master/lib/sql.js#L1388

You could say that this is "working as intended" and that its the database structure is bad/wrong/whatever, but there is no reason why the connector should be dictating an order on all queries.

This functionality should be defined in the default scope of the datasource and model, not built into the base loopback connector.

Acceptance Criteria

agnes512 commented 4 years ago

@SippieCup thanks for bringing up the issue. I agree that order is not needed for many use cases.

We currently have a community PR ongoing: https://github.com/strongloop/loopback-connector-postgresql/pull/417. I think it's trying to fix the issue you mentioned. It's proposing to have flag defaultIdSort to disable sorting on id. Feel free to join the discussion!

SippieCup commented 4 years ago

Thanks, didn't notice that PR before.

While its good for the postgresql connector, and acceptable for my use case, I still think it might be worth backporting it to the parent connector for other sql-based datasources.

agnes512 commented 4 years ago

@SippieCup Agreed. I will edit the issue description/acceptance criteria a bit and let the team to estimate it. Thanks!

jannyHou commented 4 years ago

Problem: If order field is not specified, pk will be the default order field in the generated query Solution:

SippieCup commented 4 years ago

order by 1 (or whichever column you would like), would only give an improvement if you are selecting the column used in the where filter, at which point, it is just redundant versus removing the order by clause entirely. Furthermore, If the user is not selecting fields to return, they would have in know the order of the columns as they appear in the database and select the right column (while also handling hidden properties / columns potentially not defined in the loopback model) to figure out the correct column number to select.

It would would make code unreadable, breaking on model/database changes, and would be adding another magnitude of complexity. But I guess it does "work." If thats the solution strongloop chooses, I'll just continue using my forked connector where I just removed the offending code entirely, but I'd feel sorry for other users.

order by '' I'm fairly certain will give you an error as its would be a non-integer constant.

I see two functional solutions: Adding a flag as you have stated and in the PR, or removing the order by id default, and have users explicitly declare if they want order by id within the model scope.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been closed due to continued inactivity. Thank you for your understanding. If you believe this to be in error, please contact one of the code owners, listed in the CODEOWNERS file at the top-level of this repository.

bajtos commented 4 years ago

I see two functional solutions: Adding a flag as you have stated and in the PR, or removing the order by id default, and have users explicitly declare if they want order by id within the model scope.

I like the second solution more, it seems more elegant and easier to implement to me. However, I am concerned about backwards compatibility - changing the default sort order is likely to affect all existing LoopBack applications out there.

I think we need to implement a feature flag to allow users to keep the old behavior. This feature flag should be initially enabled by default. After some time, we can deprecate it (start printing deprecating warnings for users still using it) and after even more time we can eventually remove it.

I think defaultIdSort is exactly the kind of a feature flag we want.

@SippieCup would you like to take a look at how we can port https://github.com/strongloop/loopback-connector-postgresql/pull/417 to loopback-connector and contribute the changes to enable this feature for all SQL connectors?

SippieCup commented 4 years ago

Sorry for the late response, Didn't notice this until now due to the extreme workload I have due to covid. I can take a look at it next week and see about upstreaming the changes in a more usable way for connectors. We ended up just removing the code from the base connector.

Now that You guys have decided how to implement it, I can see about building this solution into it and create a pull request.

bajtos commented 4 years ago

@SippieCup awesome, I am looking forward to review your pull request(s). I have assigned the issue to you, so that others know somebody is already working on it.