jlongster / absurd-sql

sqlite3 in ur indexeddb (hopefully a better backend soon)
MIT License
4.16k stars 101 forks source link

How does query work (without loading everything into memory)? #4

Closed zeroliu closed 3 years ago

zeroliu commented 3 years ago

Nice work for exploring a new way to improve the persistent storage on web. The performance improvement is very impressive.

I have a question around querying. In your blog, you mentioned it never loads the database into memory because it only loads whatever SQLite asks for. It confuses me how the library is able to execute SQL statements against ArrayBuffers persisted in idb without loading them into memory first. Can you please share more details on the memory usage of the library? Sharing JS heap snapshots would help a lot.

Thanks!

Tankenstein commented 3 years ago

I'm not the author, but think I can answer this.

Imagine a library - rows of shelves filled with books. You want to find the first sentence of a particular book, and to optimize your busy schedule you'd like to need to read as little books as possible. By default, you'd have to start going through all the books in the library to find the right one. However, thankfully, someone has indexed the library, and the first book in the library contains alphabetical information where every book in the library can be found. You scan the first book (perhaps with an alphabetical binary search?) to find where your particular book is, and then head straight for it. This means you only need to look at two books instead of most of them to find your answer.

This is one of the core capabilities of any file system based database - being able to manage indexes that optimize what memory pages need to be loaded and looked at. The memory pages are persisted in idb, and so are the B-tree indexes. Effectively, sqlite first takes your query, parses it and then builds an optimized "query plan" - an optimal sequence of operations to run to fulfil your query. It can do this by loading information about the indexes it has (that are in memory pages) and then using black magic to optimize your query for how your information is structured using these indexes. Once it has this query plan, it uses its own virtual machine (awesome, right) to execute it. This in turn will refer to the b-tree which will refer to only the particular memory pages it needs.

If you use indexes to structure your information well for the queries you will want to run on it, it will only have to load the memory pages that contain your b-trees and then use that to refer to only the pages this query will need. This means that if you run a really suboptimal query without indexes that needs to scan every row in your entire database, it will technically still have to go through all the memory pages. However at this point you lose the point of the database, and usually you'd structure your indexes for the kinds of queries you'd like to do. You can always add indexes later if you see you have a need for them - databases are friendly like that.

More info about how sqlite in particular manages this can be found on the excellent architecture page of the sqlite documentation.

However, if you'd like to understand these concepts more in depth I can highly recommend this great tutorial that takes you through building a simplistic sqlite clone from scratch.

jlongster commented 3 years ago

@Tankenstein provided a great answer, way more thorough than what I was going to say, thanks!

This library treats IDB just like a disk. A hard disk stores info in 4K blocks of bytes -- it can never read anything smaller than that (this is simplified). SQLite is built with all the things mentioned above to avoid having to read all the data needed from memory; it only loads the necessary blocks. We don't need to know of any of the fancy stuff that SQLite does; we just intercept read/write calls from SQLite and load in or wrote 4K blocks. But the difference is we are writing it to IDB, instead of a hard disk!

zeroliu commented 3 years ago

Great answers indeed. Thanks! @Tankenstein @jlongster