yathit / ydn-db-fulltext

Full text search module for YDN-DB
30 stars 4 forks source link

hanging browser on put #4

Closed pilosof closed 9 years ago

pilosof commented 10 years ago

when using db.put('store',[]) for multiple objects it seems the reverse indexing process is intensive and causing major browser hiccups ...

any way around it ? like relaxed delayed indexing ?

pilosof commented 10 years ago

or there a way to do the indexing directly (similar to sqlite_ft) so I can create the lazy indexing without hanging the regular puts ?

yathit commented 10 years ago

Yes, you should throttle, indexing full text search is very intensive process. Relaxing is just waiting until previous put operation is finished. Eg:

var data = [... large array to store and index ...]

var batchInsert = function(startIndex) {
    if (startIndex >= data.length) {
       return 0;
    }
    var bs = 9; // depending on data size
    return db.put('store', data.slice(startIndex, startIndex + bs)).then(function(keys) {
      return batchInsert(startIndex + bs).done(function(cnt) {
         return keys.length + cnt;
      });  
    }, function(e) {
      window.console.error(e.stack || e);
    );
});

batchInsert(0).done(function(cnt) {
  console.log(cnt + ' inserted, ready to query now');
});

Adjust bs, there should not be browser hiccups.

Lazy indexing is not possible with current implementation. For that you will use web worker. Even better I think.

pilosof commented 10 years ago

thanks for your reply, by binding the indexing to the put you delay the main data storage. do you plan on adding direct indexing support ? this way we can put the main data and then lazy index directly sqlite style even in a web worker

yathit commented 10 years ago

Hi Erez,

No, because it is too specific use case. Please use web worker.

Kyaw

On Wed, Aug 27, 2014 at 3:03 PM, Erez Pilosof notifications@github.com wrote:

thanks for your reply, by binding the indexing to the put you delay the main data storage. do you plan on adding direct indexing support ? this way we can put the main data and then lazy index directly sqlite style even in a web worker

— Reply to this email directly or view it on GitHub https://github.com/yathit/ydn-db-fulltext/issues/4#issuecomment-53533679 .

pilosof commented 10 years ago

unbinding of data storage and fulltext indexing is the common practice in most solution certainly not specific use case when talking about lots of text... anyway thanks for a great library :)

yathit commented 10 years ago

You can do that it in few code, isn't? Do you really need help from library? What are the tricky points?

pilosof commented 10 years ago

didn't understand...

yathit commented 10 years ago

What is unbinding? On Aug 27, 2014 3:22 PM, "Erez Pilosof" notifications@github.com wrote:

didn't understand...

— Reply to this email directly or view it on GitHub https://github.com/yathit/ydn-db-fulltext/issues/4#issuecomment-53534929 .

pilosof commented 10 years ago

currently when putting new data event immediately triggers index event (bind) the way it should work is:

  1. put new data immediately without triggering ft index but adding {_ft_pending_timestamp} property
  2. go over the index of _ft_pending_timestamp property and creating the actual fulltext index one by one possibly in a webworker

basically unbinding put event from indexing

yathit commented 10 years ago

I see. But I don't think it is necessary.

causing major browser hiccups

It should not. Are you guessing or really happening?

pilosof commented 10 years ago

really happening, we're developing email app with offline mode, so we index lot's of email messages, ydn db works great (without ft) , but during initial sync (1000's of messages) everything is hanging when adding ft.

when you do massive fulltext indexes you never do it as soon as you put the data, even server solutions like sphinx, you always put the data first then do the indexing gradually

pilosof commented 10 years ago

a simple solution will be for you to enable to create a full text store without the binding like:

fullTextCatalogs=[{ name:'messages_ft', lang:'en', sources:[] }]

and then I can do something like db.put('messages_ft',{message:'bla bla bla'},[index,index]); which will do fulltext (since its a fulltext type store)

basically unbinding the fulltext index from the regular store events, this is also the approach of sqlite and it's great

yathit commented 10 years ago

I use heavy indexing too, but did not observe browser hiccups.

If you avoid loading large data into memory, there should NOT be any hiccups. It is not related to indexing, since you will have to do indexing anyways.

Load data from server gradually and wait until index finish.

On Wed, Aug 27, 2014 at 4:22 PM, Erez Pilosof notifications@github.com wrote:

really happening, we're developing email app with offline mode, so we index lot's of email messages, ydn db works great (without ft) , but during initial sync (1000's of messages) everything is hanging when adding ft.

when you do massive fulltext indexes you never do it as soon as you put the data, even server solutions like sphinx, you always put the data first then do the indexing gradually

— Reply to this email directly or view it on GitHub https://github.com/yathit/ydn-db-fulltext/issues/4#issuecomment-53539902 .

yathit commented 10 years ago

It is just adding complexity without solving actual problem.

On Wed, Aug 27, 2014 at 4:35 PM, Erez Pilosof notifications@github.com wrote:

a simple solution will be for you to enable to create a full text store without the binding like:

fullTextCatalogs=[{ name:'messages_ft', lang:'en', sources:[] }]

and then I can do something like db.put('messages_ft',{message:'bla bla bla'},[index,index]); which will do fulltext (since its a fulltext type store)

basically unbinding the fulltext index from the regular store events, this is also the approach of sqlite and it's great

— Reply to this email directly or view it on GitHub https://github.com/yathit/ydn-db-fulltext/issues/4#issuecomment-53541116 .

pilosof commented 10 years ago

it is an actual problem because you delay regular data browsing (not search) because of fulltext indexing. those are two separate things..

sqlite doing it, sphinx doing it, there is a reason behind this approach, and it's not complex !

yathit commented 10 years ago

Fulltext indexing take time and storage, but does not take CPU on UI threat.

Data are available to be query after each put.

So I am not convince your problem description.

On Wed, Aug 27, 2014 at 4:38 PM, Erez Pilosof notifications@github.com wrote:

it is an actual problem because you delay regular data browsing (not search) because of fulltext indexing. those are two separate things..

— Reply to this email directly or view it on GitHub https://github.com/yathit/ydn-db-fulltext/issues/4#issuecomment-53541380 .

yathit commented 10 years ago

I am thinking about your problem. You are right that it might block if indexing is too intensive. You have to use parallel transaction thread so that main thread is not block. Read more here.

var indexing_db = db.branch('multi', false); // create parallel multi request transaction thread
indexing_db.put('store', data);

db.search('store', 'john'); // this will execute without waiting indexing_db transaction
pilosof commented 10 years ago

thanks, I'll try that, btw webworker is not an option since you can't access indexeddb from firefox ...

yathit commented 10 years ago

who said indexeddb is not available in webworker?

pilosof commented 10 years ago

firefox did :) https://bugzilla.mozilla.org/show_bug.cgi?id=701634

yathit commented 9 years ago

That is bug report and old. You should able to use indexeddb in webworker.

pilosof commented 9 years ago

nope doesn't work on firefox, the bug is still open, just re-tested it

importScripts('lib/ydn.db-isw-core-e-cur-text.js'); var schema={stores:[ {name: 'test',keyPath:'id'} ]}; var db = new ydn.db.Storage('workertest', schema,{mechanisms:['indexeddb']}); db.addEventListener('ready',function(e) { throw "WORKERDB";

});

yathit commented 9 years ago

It works. Could you post a complete app?