google / lovefield

Lovefield is a relational database for web apps. Written in JavaScript, works cross-browser. Provides SQL-like APIs that are fast, safe, and easy to use.
https://google.github.io/lovefield/
Apache License 2.0
6.82k stars 366 forks source link

Distinct and OrderBy Unindexed field #269

Open falahati opened 4 years ago

falahati commented 4 years ago

A simple query as follow works great:

await db
    .select(lf.fn.distinct(table.id))
    .from(table)
    .orderBy(table.id, lf.Order.ASC)
    .limit(10)
    .exec();

or even by an string field:

await db
    .select(lf.fn.distinct(table.id))
    .from(table)
    .orderBy(table.title, lf.Order.ASC)
    .limit(10)
    .exec();

But ordering by date doesn't work in my case:

await db
    .select(lf.fn.distinct(table.id))
    .from(table)
    .orderBy(table.created, lf.Order.ASC)
    .limit(10)
    .exec();

By not working I mean it gives me a result that doesn't change if I modify the order to DESC or ASC. It just returns a static result as if the date created is zero or the same number for all rows. Checked manually and each row does in fact have a different value. Removing the distinct function fixes the problem.

If you need any additional information, please ask.

falahati commented 4 years ago

Additionally, when in a join, distinct also breaks the orderBy when ordered by a field in a table that is not used in distinct argument:

await db
   .select(lf.fn.distinct(table.id))
   .from(table, secondTable)
   .where(secondTable.parentId.eq(table.id))
   .orderBy(secondTable.id, lf.Order.ASC)
   .limit(10)
   .exec();

This will yield the same result as when ordering by created in the main table.

freshp86 commented 4 years ago

By not working I mean it gives me a result that doesn't change if I modify the order to DESC or ASC. It just returns a static result as if the date created is zero or the same number for all rows.

Could you post the code that registers the db schema? Can you verify that

Removing the distinct function fixes the problem.

That sounds pretty interesting. Do you need to use distinct() though, if table.id is already unique (I assume?). Either way, does an equivalent query in SQLlite returns the results you expect?

falahati commented 4 years ago

Well, I just dug a little deeper into this, and here is what I have found:

https://codepen.io/s-flhti/pen/mdEebgP

I am wondering what would happen when it is sorted by an aggregated function on a groupBy table1.parentId result set in a multi-table query since there is no index for the aggregated function. Like this:

await db
   .select(table.id)
   .from(table, table2)
   .where(table2.parentId.eq(table.id))
   .groupBy(table2.parentId)
   .orderBy(lf.fn.min(table2.integerField), lf.Order.ASC)
   .limit(10)
   .exec();

But that's another discussion.

EDIT

created is registered as lf.Type.DATE_TIME?

yes

when you populate these fields you are actually adding JS Date objects and not something else?

yes, I have a Typescript wrapper on top of LoveFields and it is strongly typed to the interfaces I have, so I can't really make a mistake like this. Unless I explicitly cast the value to any. I wrote this wrapper to save me especially from these sorts of mistakes.

Do you need to use distinct() though, if table.id is already unique (I assume?).

Well, it is complicated. I don't really need to use distinct on table1.id. I just did that as part of the example posting here. In reality, I have a dynamic list of filters that might happen on table1, or table2, or both. And since each row in table1 has multiple rows in table2, I need to use distinct on table2.parentId to get a clean and unique list of items. I do also have sorting options that might happen on table1 or table2 so I need to keep that in mind too. I was hoping to be able to do all of these filters and sorting in one query along with pagination (skip, limit) but apparently, it is impossible for now. So I have decided to do it in 3 different queries, and do the sorting and pagination later on the JS side. Lovefield is fast enough that I can afford to do so with around 12k rows in both tables in less than 100ms. So no worries.

Either way, does an equivalent query in SQLlite returns the results you expect?

SELECT DISTINCT(t1.id) FROM table1 as t1 ORDER BY t1.title DESC works as expected and needs no index to return the valid resultset.

SELECT DISTINCT(t1.id) FROM table1 as t1 INNER JOIN table2 as t2 ON t2.parentid = t1.id ORDER BY t2.created DESC also works as expected.

So both these two limitations are apparently only applicable to Lovefield and SQLite has no such rules and limitations regarding distinct and indexes, or distinct and inner join.

Tested with https://sqliteonline.com/ with this data: sqlite.zip