New db to speed up full text queries and library updates

dougy83 commented 9 months ago

All the .json.gz and .json.stock files have been processed into three compressed tables/files, downloadable as a single 20MB tar archive; these tables are copied directly into IndexedDB without decompression. The database update now takes ~150ms + time to download a 20MB file.

Running queries on IndexedDB for full text search previously took around 15 seconds for "Select". These queries on this new db take around 2 seconds for "Select", and act on the compressed tables. Note: the query takes ~1.5 seconds, but an extra variable amount of time is lost to allow aborting the query, and yielding to the UI.

The code to process the .json.gz/.stock.json files is in javascript, and takes around 15 seconds to complete.

I'm currently using gzip for compression. I tried with lz4, which was 3x faster, until I changed any of the code to allow streaming decompression. Could be worth looking at in future, when lz4 is natively supported by browsers.

yaqwsx commented 9 months ago

First, let me say a huge thank your for the work you are putting into this and the other related PR. I really like it and appreciate it.

I have, however, one suggestion to make. I suggest removing [generateJsonlinesDatabaseFiles.js](https://github.com/yaqwsx/jlcparts/pull/114/files#diff-4afa54d033975e4da9f76722851b166de044d8bca7ebfcd6d9221d0113600563). And also removing the old JSON files. Having them is just an extra burden, inefficiency, and overall messy IMHO (why keep something we don't need?). Also, having data processing in two languages is an extra maintenance burden.

We have a SQLite DB that has all the components stored in a raw format. From that, we generate the per-category JSON files. We already have a library abstraction that hides the DB and allows you to operate in "part" and "category" terms. The whole generator is in https://github.com/yaqwsx/jlcparts/blob/1a412750247bfa9ecb4fffc9617286025f2decd7/jlcparts/datatables.py#L351-L384

I have two suggestions:

let's generate whatever format you need in Python by rewriting the datafiles generating. This is the preferred option.
let's discard the original datatables and let your script operate on top of the SQLite (not so preferred).

What do you think?

dougy83 commented 9 months ago

First, let me say a huge thank your for the work you are putting into this and the other related PR. I really like it and appreciate it.

Most welcome. I have been using your page quite often recently, and I love it, so thank you for creating it.

I have, however, one suggestion to make. I suggest removing [generateJsonlinesDatabaseFiles.js](https://github.com/yaqwsx/jlcparts/pull/114/files#diff-4afa54d033975e4da9f76722851b166de044d8bca7ebfcd6d9221d0113600563). And also removing the old JSON files. Having them is just an extra burden, inefficiency, and overall messy IMHO (why keep something we don't need?). Also, having data processing in two languages is an extra maintenance burden.

If we remove the generateJsonlinesDatabaseFiles.js file, and put the processing in the python script, then there's no need for the original subcategory JSON files to be created. If they're not used by anything else, then of course, they can go.

I wrote it in js, as it's easier for me, as I don't program in python. I can add it to the python script; it shouldn't be too complex.

We have a SQLite DB that has all the components stored in a raw format. From that, we generate the per-category JSON files. We already have a library abstraction that hides the DB and allows you to operate in "part" and "category" terms. The whole generator is in

https://github.com/yaqwsx/jlcparts/blob/1a412750247bfa9ecb4fffc9617286025f2decd7/jlcparts/datatables.py#L351-L384

I have two suggestions:

let's generate whatever format you need in Python by rewriting the datafiles generating. This is the preferred option.

let's discard the original datatables and let your script operate on top of the SQLite (not so preferred).

What do you think?

I think the first option is the best (i.e. change output format of the generator), as it's a simple change and leverages your existing code, which has proven to work, and handles a bunch of edge cases.

dougy83 commented 9 months ago

The python script has been updated to generate the single output file, without creating the .json.gz and .stock.json file, and the javascript file has been nuked.

dougy83 commented 8 months ago

@yaqwsx Is there anything further I should do to with this PR (or the others)?

yaqwsx commented 8 months ago

It's my turn; I'll review them ASAP. I just ran out of my time allocated for the OSS projet's maintenance as there was a lot of work on KiKit with the recent release of KiCAD v8.

dougy83 commented 8 months ago

Ok, no worries. I'm not trying to push you (I can use my test site), I just wasn't sure if I'd done it wrong.

I did notice the auto-merge of the different PRs makes bad merges (they touch similar parts of the same files). If you can say which PRs you want to include, I can rebase those ones on each other, sequentially, and correct the bad merges.

yaqwsx / jlcparts

New db to speed up full text queries and library updates #114