Closed pilosof closed 9 years ago
or there a way to do the indexing directly (similar to sqlite_ft) so I can create the lazy indexing without hanging the regular puts ?
Yes, you should throttle, indexing full text search is very intensive process. Relaxing is just waiting until previous put
operation is finished. Eg:
var data = [... large array to store and index ...]
var batchInsert = function(startIndex) {
if (startIndex >= data.length) {
return 0;
}
var bs = 9; // depending on data size
return db.put('store', data.slice(startIndex, startIndex + bs)).then(function(keys) {
return batchInsert(startIndex + bs).done(function(cnt) {
return keys.length + cnt;
});
}, function(e) {
window.console.error(e.stack || e);
);
});
batchInsert(0).done(function(cnt) {
console.log(cnt + ' inserted, ready to query now');
});
Adjust bs
, there should not be browser hiccups.
Lazy indexing is not possible with current implementation. For that you will use web worker. Even better I think.
thanks for your reply, by binding the indexing to the put you delay the main data storage. do you plan on adding direct indexing support ? this way we can put the main data and then lazy index directly sqlite style even in a web worker
Hi Erez,
No, because it is too specific use case. Please use web worker.
Kyaw
On Wed, Aug 27, 2014 at 3:03 PM, Erez Pilosof notifications@github.com wrote:
thanks for your reply, by binding the indexing to the put you delay the main data storage. do you plan on adding direct indexing support ? this way we can put the main data and then lazy index directly sqlite style even in a web worker
— Reply to this email directly or view it on GitHub https://github.com/yathit/ydn-db-fulltext/issues/4#issuecomment-53533679 .
unbinding of data storage and fulltext indexing is the common practice in most solution certainly not specific use case when talking about lots of text... anyway thanks for a great library :)
You can do that it in few code, isn't? Do you really need help from library? What are the tricky points?
didn't understand...
What is unbinding? On Aug 27, 2014 3:22 PM, "Erez Pilosof" notifications@github.com wrote:
didn't understand...
— Reply to this email directly or view it on GitHub https://github.com/yathit/ydn-db-fulltext/issues/4#issuecomment-53534929 .
currently when putting new data event immediately triggers index event (bind) the way it should work is:
basically unbinding put event from indexing
I see. But I don't think it is necessary.
causing major browser hiccups
It should not. Are you guessing or really happening?
really happening, we're developing email app with offline mode, so we index lot's of email messages, ydn db works great (without ft) , but during initial sync (1000's of messages) everything is hanging when adding ft.
when you do massive fulltext indexes you never do it as soon as you put the data, even server solutions like sphinx, you always put the data first then do the indexing gradually
a simple solution will be for you to enable to create a full text store without the binding like:
fullTextCatalogs=[{ name:'messages_ft', lang:'en', sources:[] }]
and then I can do something like db.put('messages_ft',{message:'bla bla bla'},[index,index]); which will do fulltext (since its a fulltext type store)
basically unbinding the fulltext index from the regular store events, this is also the approach of sqlite and it's great
I use heavy indexing too, but did not observe browser hiccups.
If you avoid loading large data into memory, there should NOT be any hiccups. It is not related to indexing, since you will have to do indexing anyways.
Load data from server gradually and wait until index finish.
On Wed, Aug 27, 2014 at 4:22 PM, Erez Pilosof notifications@github.com wrote:
really happening, we're developing email app with offline mode, so we index lot's of email messages, ydn db works great (without ft) , but during initial sync (1000's of messages) everything is hanging when adding ft.
when you do massive fulltext indexes you never do it as soon as you put the data, even server solutions like sphinx, you always put the data first then do the indexing gradually
— Reply to this email directly or view it on GitHub https://github.com/yathit/ydn-db-fulltext/issues/4#issuecomment-53539902 .
It is just adding complexity without solving actual problem.
On Wed, Aug 27, 2014 at 4:35 PM, Erez Pilosof notifications@github.com wrote:
a simple solution will be for you to enable to create a full text store without the binding like:
fullTextCatalogs=[{ name:'messages_ft', lang:'en', sources:[] }]
and then I can do something like db.put('messages_ft',{message:'bla bla bla'},[index,index]); which will do fulltext (since its a fulltext type store)
basically unbinding the fulltext index from the regular store events, this is also the approach of sqlite and it's great
— Reply to this email directly or view it on GitHub https://github.com/yathit/ydn-db-fulltext/issues/4#issuecomment-53541116 .
it is an actual problem because you delay regular data browsing (not search) because of fulltext indexing. those are two separate things..
sqlite doing it, sphinx doing it, there is a reason behind this approach, and it's not complex !
Fulltext indexing take time and storage, but does not take CPU on UI threat.
Data are available to be query after each put
.
So I am not convince your problem description.
On Wed, Aug 27, 2014 at 4:38 PM, Erez Pilosof notifications@github.com wrote:
it is an actual problem because you delay regular data browsing (not search) because of fulltext indexing. those are two separate things..
— Reply to this email directly or view it on GitHub https://github.com/yathit/ydn-db-fulltext/issues/4#issuecomment-53541380 .
I am thinking about your problem. You are right that it might block if indexing is too intensive. You have to use parallel transaction thread so that main thread is not block. Read more here.
var indexing_db = db.branch('multi', false); // create parallel multi request transaction thread
indexing_db.put('store', data);
db.search('store', 'john'); // this will execute without waiting indexing_db transaction
thanks, I'll try that, btw webworker is not an option since you can't access indexeddb from firefox ...
who said indexeddb is not available in webworker?
firefox did :) https://bugzilla.mozilla.org/show_bug.cgi?id=701634
That is bug report and old. You should able to use indexeddb in webworker.
nope doesn't work on firefox, the bug is still open, just re-tested it
importScripts('lib/ydn.db-isw-core-e-cur-text.js'); var schema={stores:[ {name: 'test',keyPath:'id'} ]}; var db = new ydn.db.Storage('workertest', schema,{mechanisms:['indexeddb']}); db.addEventListener('ready',function(e) { throw "WORKERDB";
});
It works. Could you post a complete app?
when using db.put('store',[]) for multiple objects it seems the reverse indexing process is intensive and causing major browser hiccups ...
any way around it ? like relaxed delayed indexing ?