phiresky / sql.js-httpvfs

Hosting read-only SQLite databases on static file hosters like Github Pages
Apache License 2.0
3.46k stars 103 forks source link

[not an issue] Questions on how the plugin works and how to speed up initial worker load #44

Closed jiwidi closed 1 year ago

jiwidi commented 1 year ago

Hi!

First of all thanks for the work in this project.

I have a personal project in progress at lensdatabase.org to be a live table for lenses and choose your pluging because its speeds for applying filters by just modifying the query. Code is currently at https://github.com/jiwidi/lens_database

My problem now is that the initial load of the worker takes a lot of time, after that all queries run very fast. The worker prints the following in the console while loading (~15sec)

image

What is that loading mean? Is there any way I could speed this up or precompute that load so its faster on the client side? I'm fairly new with sqlite so maybe im missing to understand key concepts but would love some light.

Best, Jaime

phiresky commented 1 year ago

That definitely doesn't look normal. Just for loading the database it should do a few queries max. Your screenshots looks like it's doing a full table scan in reverse order. Are you sure you're not running a query with a large result set right at the beginning?

Independent of that you should make sure your database is Vacuumed and increase the page size from 4kB to 32kB.

phiresky commented 1 year ago

image

jiwidi commented 1 year ago

@phiresky

I am running a big query at the beginning, basically querying everything "select * from table". The file is around 14mb.

I did the vacumm and added a limit 200 to the first query as we dont expect the user to want all rows from the beginning (reason for the * is that no filter is selected). These changes made it made it much faster thanks.

I have a question now, how do you determine page size? why 32kb and not 1MB for example? Are there any drawbacks to increasing it more?

phiresky commented 1 year ago

the page size is the minimum amount of data fetched for every bit of info needed. So even if you just want to get a single lens info you'll need to fetch at least page size data. So you don't want it to be too big. Also 64kB is the max SQLite allows ;)

Looking more at your project, to be honest I think you probably don't want sql.js-httpvfs at all. Your lenses.db file as JSON compresses to 269kB with gzip and to 189kB with zstd. At that point it's probably more efficient to just load the JSON file and decompress it, filtering and sorting in JS. The overhead in both code complexity and loading the database and indexes isn't wort it.

jiwidi commented 1 year ago

the page size is the minimum amount of data fetched for every bit of info needed. So even if you just want to get a single lens info you'll need to fetch at least page size data. So you don't want it to be too big. Also 64kB is the max SQLite allows ;)

Gotcha

Looking more at your project, to be honest I think you probably don't want sql.js-httpvfs at all. Your lenses.db file as JSON compresses to 269kB with gzip and to 189kB with zstd. At that point it's probably more efficient to just load the JSON file and decompress it, filtering and sorting in JS. The overhead in both code complexity and loading the database and indexes isn't wort it.

I tried https://tochoromero.github.io/vuejs-smart-table/ before sql.js-httpvfs and that resulted in much bigger load times, I was just using the output of a pandas save to json (python save table to json) and no compressing at all of the json file. I wanted something that would help me with the filtering logic in JS so thats why i went with that plugin, the reason for sql.js-httpvfs is that i could easily program the filters to sql syntax.

How would you go if you had it in pure JSON? Write the filtering logic all by yourself or anything you would use for that? I'm not experience at all in frontend so any advice here is welcome

phiresky commented 1 year ago

You have 10k entries in your lenses table. Parsing that JSON with JSON.parse() shouldn't take more than a few milliseconds, and filtering that list with JS should also not take more than a few milliseconds.. Yes I would write the filtering logic in JS (or use a lib like the one you did).

If you just have a plain JSON file in your github-pages branch, the browser should automatically get served the gzip-compressed version (you can verify in the network tab of the dev tools). Then you also don't have to think about decompressing since the browser handles it.

So your performance being bad with the simple solution is surprising. Maybe it's due to vuejs-smart-table, I'd start by figuring out if it's still slow if you just load the data and log one of the lenses to the console. Could be that it was just due to slow rendering of 10k elements in the DOM..

jiwidi commented 1 year ago

That makes sense, i didnt get to evaluate where the lag came from: if it was the library i was using or just the elements being loaded in the dom. Will definitely try it now! thanks for the help!