tursodatabase / libsql

libSQL is a fork of SQLite that is both Open Source, and Open Contributions.
https://turso.tech/libsql
MIT License
11.26k stars 286 forks source link

Optimize IFNULL with indexes #1552

Open rentalhost opened 4 months ago

rentalhost commented 4 months ago

Hello,

I've been studying SQLite using the Turso platform. There, every read or write operation is counted. So, if I run a query that doesn't match an existing index, a table scan is performed, consuming as many read credits as there are records in the table. This can be quite costly.

Let me share a real example from my experience:

CREATE TABLE "example_entries" (
    "id"        INTEGER NOT NULL UNIQUE,
    "name"      TEXT,
    "updatedAt" INTEGER,
    "createdAt" INTEGER NOT NULL,

    PRIMARY KEY("id" AUTOINCREMENT)
);

Whenever a record is created, only the createdAt field is populated. For simplicity, let's assume updatedAt remains NULL during creation.

If I run a query based on either updatedAt or createdAt, it results in a table scan because neither column is indexed. To address this, we can create indexes:

CREATE INDEX "example_entries.updatedAt" ON "example_entries" ("updatedAt" ASC);
CREATE INDEX "example_entries.createdAt" ON "example_entries" ("createdAt" ASC);

Great! Now, let's say I want to return the id, name, and the update date, or the creation date if the update date is NULL. This can be done with:

SELECT `id`, `name`, IFNULL(`updatedAt`, `createdAt`) AS `updatedAt` FROM `example_entries`;

This works well, though we still have the table scan issue, but without a WHERE clause, there's not much that can be done.

Now, let's limit our query to find records updated or created after a certain timestamp:

SELECT `id`, `name`, IFNULL(`updatedAt`, `createdAt`) AS `updatedAt` FROM `example_entries` WHERE IFNULL(`updatedAt`, `createdAt`) >= @timestamp;

Although this filter works, it still results in a table scan. From what I understand, SQLite can't optimize IFNULL with two columns to use indexes efficiently. To solve this, I need to "expand" the IFNULL like this:

SELECT `id`, `name`, IFNULL(`updatedAt`, `createdAt`) AS `updatedAt` FROM `example_entries` WHERE `updatedAt` >= @timestamp OR (`updatedAt` IS NULL AND `createdAt` >= @timestamp);

This adds cognitive complexity for something that might be optimized internally by SQLite.

I have reported this issue to the SQLite team for their evaluation as well. You can find the post here: SQLite Forum Post.

I would love to hear the libsql team's perspective on this matter and if there's a possibility for such an optimization in libsql.

Thank you for your attention.