Is it possible to build index serverside and then sync to clientside?

EloB commented 9 years ago

I'm building a Magic the gathering search engine that will work on-/offline.

I have a quite big data for clientside search about 16.9MB json data it would be awesome if I somehow could build the index on the server and send the built version to client.

Do you think this would be possible?

nolanlawson commented 9 years ago

You could kinda sorta do it with a lot of hacks. But it might not save you as much time as you think.

With the recent performance improvements, the biggest bottleneck in pouchdb-quick-search is just writing the indexes to IndexedDB/WebSQL. Which has to be done no matter what.

Some things that could improve your performance:

using WebSQL over IndexedDB for browsers that support both (Chrome, Android)
using a separate database where the _ids are the keys you want to lookup (i.e. using allDocs instead of either query() or search() – see no. 7 in the pro tips)
make sure all your docs are generation-1 docs (i.e. the revs all start with 1-). Pouch can do some crazy optimizations when this is the case.

That being said, if you want to try the hack, you would need to:

build the index client-side
open up the Dev Tools and find the secondary database with a name like mydb-mrview-*
sync that DB to a CouchDB
find all the _local/ docs in the main database. sync those manually (Couch/Pouch won't sync _local/ docs). Each _local/ doc corresponds to a doc in the main database; you can find them if you open up the Dev Tools.

nolanlawson commented 9 years ago

I'm curious to know more about the performance characteristics of your app, though. I'm always trying to make this damned thing faster. :)

EloB commented 9 years ago

Thanks for your quick answer.

So if I get you right I should build seperate database with id:s.

Here we have an example of json data that I use. http://mtgjson.com/#exampleCard

I would like to search the name and text. Should I then use pouchdb-collate to build a special id?

var collate = require('pouchdb-collate');
var cards = require('cards.json');

// Or even better prebuild cards array with collate.toIndexableString
pouch.bulkDocs(cards.map(function(card) {
  card._id = collate.toIndexableString([card.name, card.text]);
  return card;
}).then(function() {
  console.time('Search for battle');
  pouch.search({
    query: 'battle',
    fields: ['name', 'text'],
    // include_docs: true, // jshint ignore:line
    limit: 10
  }).then(function(result) {
    console.timeEnd('Search for battle');
    console.log(result);
    // console.log(result.rows.map(function(item) { return item.doc.name; }));
    return db;
  });
});

This isn't a working example as you might see. I've just copied some lines from my code. :)

Here is some actual except from my code. https://gist.github.com/EloB/b4563a12303be6021dad

What should I do then or how could this help me to improve performance?

About https://github.com/nolanlawson/pouchdb-quick-search/issues/22#issuecomment-71378135 in some cases it's pretty fast but when I hit a lot of items then the performance drops. Even if I uses limit or disable include_docs.

nolanlawson commented 9 years ago

in some cases it's pretty fast but when I hit a lot of items then the performance drops.

You mean when the query returns many results, or when you have many cards in the database?

Also, are you using 1.0.2? And did you try WebSQL instead of IndexedDB? (dunno what browser you're using).

But yes, you are on the right track - you can use pouchCollate to do essentially what mapreduce or pouchdb-quick-search is doing under the hood.

However, collate will not do tokenization for you, so if you want full-text search (i.e. what pouchdb-quick-search provides), then it's going to get very hairy if you try to do it yourself. The hacky solution described above might be preferable.

nolanlawson commented 9 years ago

If you provided a live example, I could profile it and see what's causing the slowdown. :)

nolanlawson commented 9 years ago

Another option, which honestly might be preferable, is to get closer to the metal and use IndexedDB directly (with IndexedDBShim to support WebSQL). IndexedDB doesn't have full-text search, but you can use Lunr to turn full-text into ['arrays', 'like', 'this'] and then index on that.

PouchDB is good at syncing and at providing cross-browser support, but for bare-metal performance it's hard to beat straight WebSQL or IndexedDB. Heck, WebSQL even has native full-text search.

EloB commented 9 years ago

Here you have a working example: http://jsfiddle.net/0kzcd5sh/4/

nolanlawson commented 9 years ago

I've been playing around with your example, but unfortunately I can't tell you much except that you should probably wait for us to implement https://github.com/pouchdb/pouchdb/issues/2280. Your use-case is just really, really hard given that current state of web databases.

Depending on the platforms you are targeting, you may also want to look into YDN-DB or just straight-up WebSQL or IndexedDB (or use the IndexedDBShim). You don't really need PouchDB, because you're not syncing – you're just loading the data all at once. PouchDB has a lot of overhead because it assumes you want to sync different revisions (e.g. it stores a revision tree for each document), and this overhead is contributing to your slow times when the index builds.

Another thing is, in your example, you are using collate.toIndexableString, but it's not really doing you any favors because you are using it incorrectly... you would need to create a separate Pouch document for each token (i.e. word) in each of your documents. But then you would have just basically reimplemented pouchdb-quick-search. As-is, you would be better off just making the _ids 0, 1, 2, 3, etc., because then they are short, and thus less data for pouchdb-quick-search to have to put when it builds the index (because it needs to map back to the parent doc _id).

You could also try just throwing up a loading spinner while the index is loading. I mean, already it is taking 6 seconds in Chrome for the entire MTG database to be loaded into the database, even without the indexing. That's 6 seconds on desktop Chrome – you can imagine it's much worse on a mobile device. Maybe you can have your interface talk to the server while index-building is still in progress.

I apologize; we're trying the best we can with PouchDB, but it's just really hard to make things work cross-browser while also being fast. I hope you find a good solution.

nolanlawson commented 9 years ago

See, here's an example of your same code using raw WebSQL. Look at how fast it is! Look at how little code I had to write, to get full Porter-stemmer full-text search! Too bad it doesn't run in Firefox or IE. :(

nolanlawson commented 9 years ago

You could probably write your code twice – once for WebSQL and once for IndexedDB. I admit, though, that you'll have to write a lot more code to accommodate IndexedDB, versus this WebSQL example that I whipped up in less than an hour.

In particular, you would need to add:

Lunr, to output the token array given an input text field
one document per token you want to index, because you can't use multiEntry because IE didn't implement it
create an index for that token

Also IndexedDB has cross-browser bugs, so tread carefully. You may want to use LocalForage.

nolanlawson commented 9 years ago

Out of curiosity, I also built the same thing in "raw" IndexedDB (using Dexie.js for convenience and Lunr.js because IndexedDB lacks FTS support). Here it is.

It takes 8 seconds to insert the data in Chrome and 13 seconds in Firefox, so compared to WebSQL's 6 seconds (Chrome/Safari), that's not bad! Maybe I should build a separate module to do this...

EloB commented 9 years ago

How is the support for web workers and indexedDB/websql?

nolanlawson commented 9 years ago

Not good currently: https://github.com/pouchdb/pouchdb/issues/2806

dilignt commented 9 years ago

I'm also interested in doing this, but I'm thinking of a different strategy. I'm using Pouch within cordova apps using the SQLitePlugin and I'd like to build the database file on the server and then dump it using pouchdb-dump-cli and then load it in using the createFromLocation option.

The question is, would the same strategy work using the search functionality? A brief look at pouchdb-search code leads me to think that it doesn't expose the index database file names, so it would be hard to manually dump it on the server without getting the db file name. But if this was somehow possible, would it then be possible to createFromLocation with the same db filename, or am I barking up the wrong tree here?

Currently I'm loading the entire database into SQLlite from JSON when the app first runs and then running Lunr index from memory, but this isn't ideal for large collections and I can't afford to sacrifice user experience with long index creation times on mobile.

many thanks in advance

nolanlawson commented 9 years ago

@dilignt Unfortunately you cannot do this right now, because PouchDB will create a separate database for the index, and that second database will not pick up the location parameter. Sorry about that. We're working on consolidating the secondary index into the same database which would fix your issue.

dilignt commented 9 years ago

@nolanlawson thanks for your reply. I can work around this in the meantime by running the Lunr index in memory for smaller databases.

Do you know how to build the database file on a server and have the sqllite plugin open it from within cordova? If you use PouchDB with the sqldown adapter on the server, will this create files that can be read using the createFromLocation flag in cordova?

many thanks

nolanlawson commented 9 years ago

Nope, the sqldown adapter uses a different format. Unfortunately you would have to use Chrome or PhantomJS and then find the SQLite files and copy them over. I believe Chrome stores them in a directory called databases; not sure about Phantom.

egervari commented 8 years ago

This is the exact problem that I have. I have indexes that are 300 mb in Pouch, and I need to completely avoid making them on the client. It's not just the bad performance (although it is - sometimes these take 30-50 seconds to make on a laptop with ssd, and much longer on an old ios/android device), but I also need to avoid memory spikes and the crashes that result from that.

Does PouchDB support this now? And I cannot look at the mrview database names and minic that behaviour. For one user account, it actually uses 333 different databases - and that's not counting any new databases from indexes.

I really hope PouchDB can automate the syncing of an index/view from couch into pouch.

EloB commented 8 years ago

I went for bare bone websql solution. Works in most of the commercial browsers like Chrome, Safari, iOS (safari), Android (chrome). Could be polyfilled by either asm.js, flash and/or node server.

It was a lot of hacks but in the end it was worth it. For instance how to restore an VIRTUAL TABLE on page refresh by INSERT INTO my_virtual_table SELECT * FROM my_virtual_table LIMIT 0. This will make an full text index from previous sessions to be restored and fully functional again. You have to be creative when doing the relevance sort because matchinfo() is not working. Also the overall documentation of websql is really crap for advanced features. You can batch insert 500 rows per query (will increase performance) INSERT INTO my_table VALUES (id, name) VALUES (1, 'A'), (2, 'B'), ... (500, 'C') or if you already have the data in your database use INSERT INTO my_table1 (my, field, names) SELECT * FROM some_table. Don't use the safe question marks placeholder tx.executeSql('INSERT INTO my_table VALUSE (id, name) VALUES (?, ?)', [1, 'A']) because that will lower the amount of rows allowed to be inserted. Use custom function in javascript todo that instead.

Here is an real implementation of full text search on 30k rows in database with relevance sorting with custom search features in realtime. Try search for "@instant @blue counter spell". https://mtg.zone/

egervari commented 8 years ago

I am not really sure it would work in my case, unless the websql solution is somehow less buggy and/or already not responsible for the crashing that's already occurring on devices as it tries to build a 300mb+ index due to memory spike issues. In the end, even if that does work, for how long will it work? I really ought to just put it on the server. It is the only scalable solution, so it would be awesome for PouchDB to somehow automatically create databases and/or indexes based on couch's view urls.

Katie

On Mon, Feb 29, 2016 at 9:29 AM, Olle Bröms notifications@github.com wrote:

I went for bare bone websql solution. Works in most of the commercial browsers like Chrome, Safari, iOS (safari), Android (chrome). Could be polyfilled by either asm.js, flash and node server.

It was a lot of hacks but in the end it was worth it. For instance how to restore an VIRTUAL TABLE on page refresh by INSERT INTO my_virtual_table (docid, all, my, fields, names) SELECT docid, * FROM my_virtual_table LIMIT

This will make an full text index from previous sessions to be restored.

Here is an real implementation of full text search on 30k rows in database with relevance sorting with custom search features in realtime. Try search for "@instant https://github.com/instant @blue https://github.com/blue counter spell". https://mtg.zone/

— Reply to this email directly or view it on GitHub https://github.com/nolanlawson/pouchdb-quick-search/issues/22#issuecomment-190231845 .

EloB commented 8 years ago

@egervari Performance wise nothing can be compared against bare bone websql solution. It's so much faster and better in all aspects querying/indexing/importing/features. So for me this was the only solution. My card database is atm 50mb~ and still performs at realtime (indexing). I've stared using this library and also @nolanlawson tried to help me. He actually told me to go for bare bone solution and I'm really pleased with the result. I've also tried all other public solutions in this area and this library was the best one.

Table names/rows: borders 4 legalities 4 color_identities 6 colors 6 supertypes 6 rarities 7 layouts 11 watermarks 23 powers 26 toughness 28 formats 35 printings 190 sets 190 subtypes 332 sources 357 release_dates 441 card_names 498 artists 630 mana_costs 634 variations 1833 types 1947 mci_numbers 2268 card_supertypes 2829 card_variations 5381 flavors 12544 card_colors 15203 names 15885 image_names 16264 card_color_identities 16794 rulings 22791 card_subtypes 24789 texts 25659 card_printings 28534 cards 29965 card_types 31031 card_legalities 122824

Times: Inserting data: 5933.296ms Creating fulltext index: 2038.760ms Total: 8779.069ms

egervari commented 8 years ago

I disagree with you. If you have a server with crazy good hardware creating the index and then you just send the computed result to the device - even if that device is a piece of crap 3-year old android - it's going to be faster and more stable than your solution.

On Mon, Feb 29, 2016 at 10:10 AM, Olle Bröms notifications@github.com wrote:

Performance wise nothing can be compared against bare bone websql solution. It's so much faster and better in all aspects querying/indexing/importing/features. So for me this was the only solution. I've stared using this library and also @nolanlawson https://github.com/nolanlawson tried to help me. He actually told me to go for bare bone solution and I'm really pleased with the result. I've also tried all other public solutions in this area and this was library was the best one.

— Reply to this email directly or view it on GitHub https://github.com/nolanlawson/pouchdb-quick-search/issues/22#issuecomment-190251512 .

egervari commented 8 years ago

I think the other thing you might not be understand is that I am not dealing with hobby mtg data - I am working with large amounts of aerospace data that helps engineers solve problems with real aircraft. This has to work. It cannot crash on their device in the field.

On Mon, Feb 29, 2016 at 10:33 AM, Katie Egervari katie.egervari@gmail.com wrote:

I disagree with you. If you have a server with crazy good hardware creating the index and then you just send the computed result to the device

even if that device is a piece of crap 3-year old android - it's going to be faster and more stable than your solution.

On Mon, Feb 29, 2016 at 10:10 AM, Olle Bröms notifications@github.com wrote:

Performance wise nothing can be compared against bare bone websql solution. It's so much faster and better in all aspects querying/indexing/importing/features. So for me this was the only solution. I've stared using this library and also @nolanlawson https://github.com/nolanlawson tried to help me. He actually told me to go for bare bone solution and I'm really pleased with the result. I've also tried all other public solutions in this area and this was library was the best one.

— Reply to this email directly or view it on GitHub https://github.com/nolanlawson/pouchdb-quick-search/issues/22#issuecomment-190251512 .

EloB commented 8 years ago

I have to tried my solution on crap android and it worked in realtime and with this solution it didn't. It wasn't performant enough it could not handle about 400k~ of rows. My so called hobby project is cutting edge web application. Using a lot of fore front technologies with polyfills for instance I stream the content into the database using fetch (getReader, TextDecoder). I processed my content to be insert directly into the database.

egervari commented 8 years ago

You said 30+ mb. Just one of the many databases I deal with is 1.3gb. The total amount is almost 5gb.

On Mon, Feb 29, 2016 at 10:48 AM, Olle Bröms notifications@github.com wrote:

I have to tried my solution on crap android and it worked in realtime and with this solution it didn't. It wasn't performant enough. My so called hobby project is cutting edge web application. Using a lot of fore front technologies with polyfills for instance I stream the content into the database using fetch (getReader, TextDecoder). I processed my content to be insert directly into the database.

— Reply to this email directly or view it on GitHub https://github.com/nolanlawson/pouchdb-quick-search/issues/22#issuecomment-190264002 .

egervari commented 8 years ago

Oh sorry, I read 30mb on my phone. Now I see it's 30k rows.

In any event, I will not compute it on the device. We cannot control which devices are used and the risk of one device crashing is too great. Even if the server is less performant, it won't crash. There's no way it can crash this way. We have full control over the server.

On Mon, Feb 29, 2016 at 10:59 AM, Katie Egervari katie.egervari@gmail.com wrote:

You said 30+ mb. Just one of the many databases I deal with is 1.3gb. The total amount is almost 5gb.

On Mon, Feb 29, 2016 at 10:48 AM, Olle Bröms notifications@github.com wrote:

I have to tried my solution on crap android and it worked in realtime and with this solution it didn't. It wasn't performant enough. My so called hobby project is cutting edge web application. Using a lot of fore front technologies with polyfills for instance I stream the content into the database using fetch (getReader, TextDecoder). I processed my content to be insert directly into the database.

— Reply to this email directly or view it on GitHub https://github.com/nolanlawson/pouchdb-quick-search/issues/22#issuecomment-190264002 .

EloB commented 8 years ago

I want offline support thats why I went for websql. Server solution will always be faster but it will never support offline.

egervari commented 8 years ago

It does support offline though. I am syncing with Couch and everything - including images, attachments and other files - will be on the device. I just don't want to build the indexes on the device because it will crash. I have tried it. My indexes are not small. They probably have a million results. I am on vacation today, so I can't check, but I have 2 massive indexes to build.

If I make them on the server and download them to the device when the user syncs, it will work.

On Mon, Feb 29, 2016 at 11:21 AM, Olle Bröms notifications@github.com wrote:

I want offline support thats why I went for websql. Server solution will always be faster but it will never support offline.

— Reply to this email directly or view it on GitHub https://github.com/nolanlawson/pouchdb-quick-search/issues/22#issuecomment-190275121 .

EloB commented 8 years ago

You are syncing data not indexes and you are looking for a full text index solution. Right?

Couchdb and pouchdb are two completely different things. Pouchdb is only a couchdb compatible api that uses different adapters (websql, indexeddb, etc) that could sync with couchdb.

How many rows do you have? Which adapter do you use in pouch? If you use websql adapter you could CREATE VIRTUAL TABLE lookup USING fts3(docid, some, fields, that, you, want, fulltext, tokenize=porter) then INSERT INTO lookup (docid, some, fields, that, you, want, fulltext) SELECT rowid, name, text, some, other, fields FROM my_pouch_table or join tables to get other information. This is the fastest approach to create an full text index. Believe me I been trying to make an offline page with full text search for a long time.

As far as I know there isn't any methods in either Indexeddb or Websql to import a file containing the database.

Here is an example that fills the database with one million rows and then creates the full text index of the million rows and then searches with the full text index. It also includes my hack to restore the index after page reloads. https://jsfiddle.net/tq3a5k5y/

The first time you ran this script it will create one million fake rows of data and then make a full text index. It only took 7s to make the index. You should not put anything else than text inside the full text index. Then search the full text index and then join with your other tables.

Create fake data: 15677.871ms
Generate full text index: 6921.818ms
Searching million rows with full text index: 25.713ms
Got 100 results

After the index is done and you do a page refresh it only take ~90ms to restore the old index.

Restore previous index: 87.364ms
Searching million rows with full text index: 59.293ms
Got 100 results

egervari commented 8 years ago

At this point, I am tempted to create regular couch documents and use the _id as the way to index things.

I don't mean to be dismissive, but this isn't just a question of performance, but also that of crashing. If the design choice crashes at all, it doesn't really matter how fast it is. It cannot crash, under any circumstances.

As for the text search index, I'm pretty sure it's around 1.5 million to 1.8 million documents/rows/results, at least.

On Mon, Feb 29, 2016 at 2:55 PM, Olle Bröms notifications@github.com wrote:

You are syncing data not indexes and you are looking for a full text index solution. Right?

How many rows do you have? Which adapter do you use in pouch? If you use websql adapter you could CREATE VIRTUAL TABLE lookup USING fts3(docid, some, fields, that, you, want, fulltext, tokenize=porter) then INSERT INTO lookup (docid, some, fields, that, you, want, fulltext) SELECT rowid, name, text, some, other, fields FROM my_pouch_table or join tables to get other information. This is the fastest approach to create an full text index. Believe me I been trying to make an offline page with full text search for a long time. If you don't want to lock the single thread then you could do it with limit/offset.

As far as I know there isn't any methods in either Indexeddb or Websql to import a file containing the database.

Here is an example that fills the database with one million rows and then makes a full text index of it then makes a search with the full text index. https://jsfiddle.net/tq3a5k5y/

— Reply to this email directly or view it on GitHub https://github.com/nolanlawson/pouchdb-quick-search/issues/22#issuecomment-190355077 .

EloB commented 8 years ago

Is it pouchdb that crashes? I don't get it why websql should make your device to crash.

egervari commented 8 years ago

It could pouchdb or websql - there's no way to know. But if it's not websql, then I don't know what pouchdb is doing when it creates its indexes, as it would use websql to make them. I don't think I really have the time to build a websql solution, only to realize that it won't work. I'm really certain that computing the indexes on the server, and then caching them/updating them as needed, will solve the problem. That way once the user syncs with couch, it's done and over with. They just sync'd 1.3gb or 5gb and they can immediately start using the application. And with such low load on the device, the probability that older devices crash goes way, way down this way.

It would be awesome if PouchDB would simply use the views computed by couch rather than having to manually create the indexes as regular documents. That would have been a nice convenience, and probably a feature that should be high on the todo list. But outside of that, regular couchdb documents seems like the right way to go.

Katie

On Mon, Feb 29, 2016 at 3:53 PM, Olle Bröms notifications@github.com wrote:

Is it pouchdb that crashes? I don't get it why websql should make your device crash.

— Reply to this email directly or view it on GitHub https://github.com/nolanlawson/pouchdb-quick-search/issues/22#issuecomment-190384349 .

nolanlawson commented 8 years ago

The reason @EloB's WebSQL solution is so much faster is because undoubtedly he's using the built-in FTS (full-text search) module. Basically your entire indexer is written in C, inside of SQLite, which is one of the most battle-tested databases known to man. PouchDB cannot compete with this (heck, raw IndexedDB cannot).

@egervari PouchDB cannot sync CouchDB views, unfortunately. It's just not a feature of CouchDB that's available to us.

Creating a second PouchDB with your indexes on the _id is actually exactly what PouchDB does with the existing map/reduce implementation. The reason it's slow is because of the overhead of creating this separate database.

My own personal solution to the kind of problem you describe is to decompose the data before syncing. As an example, consider Pokedex.org (more details), which contains several CouchDB/PouchDB databases that are already set up so that everything I need to query is an _id. So the only thing that becomes slow is joining the many databases together. It's not elegant, but it's fast.

EloB commented 8 years ago

@nolanlawson Nice app there with pokedex.org! I like that approach with offline first :)

duydao commented 8 years ago

First I like to thank you for the great work you guys are doing. I love PouchDB and we don't have issues in any other apps.

We're creating a cordova app and we would like to deploy a DB with 400k records to a mobile device. Importing the json file with bulkDocs causes the memory to go up to ~1.5g and using 100% CPU. It crashes our desktop browser every time, we didn't even try to do it on the mobile device.

Syncing from a CouchDB seems to work, that's why we would like to pre-building the DB, export it by the PouchDB-Server and load the sqlite file.

We're having problems with the indexer: it takes 15 minutes to index one field on my local machine. Time is not an issue for our use case (we can sync over night), but the crashes are a show stopper. Memory goes up (but not that quickly), CPU is at 100% (Chrome Task Manager). I've tried to analyze the process and the indexer seems to use bulkDocs as well.

Is there any way influence/optimize/"slow down" the indexer to make the process more stable?

nolanlawson commented 8 years ago

@duydao You might look into prebuilt databases with PouchDB and see if it helps to have a prebuilt SQLite file. However, secondary indexes might still take up quite a bit of time during the prebuild process itself.

vladimiry commented 8 years ago

I just tried to index the same data file by the same fields with elasticlunr and pouchdb-quick-search. elasticlunr does it much much more faster, I didn't measure the speed but a rough estimate 10-15x faster. Initially I was going to use pouchdb-quick-search only, but it indexes too slowly to put it to the client side (it takes more than 1 minute on my laptop in Chrome). So for now I ended up to pre-build index with elasticlunr, then I pack it with lz-string (it did about 9x compression on my file), then I put compressed index into the PouchDB as a general document (just plain JSON with a one large string field). Such scenario works well enough on my case (offline search over a large JSON file, just in browser).

PS I was wrong, in the PouchDB I keep elasticlunr index unpacked, so just a JSON object.

My database is not updated, so that's not an option for most of the cases.

duydao commented 8 years ago

@vladimiry @nolanlawson Thanks a lot for the suggestion, I will try them out

ghost commented 7 years ago

Hello I had exactly the same problem: I was trying to implement mtg card database with searches and all in PouchDB in browser. Tried different adapters, primary indexes, secondary, map/reduce, find, quick-serach. The fastest I got was about was about 10 minutes inserting and indexing data, around 10 seconds simple query value = 5. I just tried LokiJS. Loading and indexing 1s, query 9ms.

I believe PouchDB has different usecases than LokiJS and both are very good at what they are doing, and You guys should know that for usecases similar to mtg database you should use LokiJS.

EloB commented 7 years ago

@odrzutowiec I done excatly that with my magic site... Had to go with native websql and made a polyfill with a webservice on node server. https://github.com/nolanlawson/node-websql

toyssamurai commented 7 years ago

I am interested in finding out how slow quick-search is typically AFTER an index is fully built. I have a small db (around 5Mb). Within the db, I have about 2000 docs with a field (called "title") that needs to be indexed for full text search. Each "title" field won't be longer 256 characters. When a search only returns a small number of matches, it's usually pretty fast, but when the number of matches grow to about 400 to 500, it becomes quite slow even on my workstation (i7-6700 w/ 64Gb RAM, usually takes about 100s). I can't imagine doing that on a smartphone. If this is typical, maybe I will have to implement something like how @EloB did, too! If not, then I need to dig deeper and see if I am doing something wrong.

OutsourceNow commented 2 years ago

You could kinda sorta do it with a lot of hacks. But it might not save you as much time as you think.

With the recent performance improvements, the biggest bottleneck in pouchdb-quick-search is just writing the indexes to IndexedDB/WebSQL. Which has to be done no matter what.

Some things that could improve your performance:

using WebSQL over IndexedDB for browsers that support both (Chrome, Android)

using a separate database where the _ids are the keys you want to lookup (i.e. using allDocs instead of either query() or search() – see no. 7 in the pro tips)

make sure all your docs are generation-1 docs (i.e. the revs all start with 1-). Pouch can do some crazy optimizations when this is the case.

That being said, if you want to try the hack, you would need to:

build the index client-side

open up the Dev Tools and find the secondary database with a name like mydb-mrview-*

sync that DB to a CouchDB

find all the _local/ docs in the main database. sync those manually (Couch/Pouch won't sync _local/ docs). Each _local/ doc corresponds to a doc in the main database; you can find them if you open up the Dev Tools.

How does one make sure that all docs are generation 1 docs when they are dealing with transaction data? I have certain types of documents that have to change all the time there is a transaction, I have revs_limit set to 1, but that gives the highest revision on the database. What would you advise in that case?

pouchdb-community / pouchdb-quick-search

Is it possible to build index serverside and then sync to clientside? #22