mafintosh / hyperdb

Distributed scalable database
MIT License
752 stars 75 forks source link

Partial select #156

Open saurabhabh opened 5 years ago

saurabhabh commented 5 years ago

I have an use case where I plan to store the users in the db with key users/{id}. So the keys will be users/1 users/2 and so on. Each key will have details for the users like name, address, phone number etc. Now I can retrieve all the users by using the list api as follows db.list('/users/', function (err, nodes) { }).

Now Is there a way where I can select/get/list only 10 users from a list of 100 ? This may be based on timestamp (last updated users). I require this as I want to reduce the loading time as fetching 10 users is faster than fetching 100 users and then showing only 10 of them on UI.

m-onz commented 5 years ago

If you use db.list('/users/', function (err, nodes) you receive callback with all matching nodes. It would be easy to loop through them and filter out what your not interested in using a condition. This could be done with loops or using map / forEach.

You can also try using a stream...

var stream = db.createReadStream('/users')
stream.pipe(take(100)).pipe(process.stdout)

Where take(100) is a transform stream.

More info on streams: https://github.com/substack/stream-handbook

Streams are a best practice and my preferred way of dealing with sets of potentially infinite size.

saurabhabh commented 5 years ago

Hi @m-onz thanks for the reply. I am working on Angular - electron a desktop application. Here is what I implemented. I wanted the users in reverse order (last added) so I used the reverse option on the Stream. I dont seem to get the correct users. Can you shed some light on how the reverse works? does it work on time stamp ? I am looking to fetch last 5 users that are added to db.

I store users via the time stamp (when the user is created) as a key. /users/{timestamp} Below is the code snippet that I am using.

var stream = self.db.createReadStream(/users/, {reverse: true})

        let  i = 0;
        let array:Array<User> = []
        stream.on('data', (chunk) => {
             if( i >= 5){
                 stream.destroy();
             }
            i = i + 1; 
            if(chunk && chunk.length > 0){
                console.log(`Received ${chunk.length} bytes of data. ${chunk[0].value}`);
                array.push(message);
            }
        });
        stream.on('end', function() {
            console.log("end");
            // sort the Users array by time.
            array =  _.sortBy(array, o => o.msgtime);
            cb(array);
        });
        stream.on('close', function() {
            console.log("close");
            // sort the Users array by time.
            array =  _.sortBy(array, o => o.msgtime);
            cb(array);
        });
m-onz commented 5 years ago

You are right. createReadStream does not return the list sorted by insertion time. Try using .createHistoryStream instead..

var hyperdb = require('hyperdb')

var db = hyperdb('./test.db', { valueEncoding: 'json' })

var i = 0

var interval = setInterval(function () {
        db.put('/users/'+Date.now(), { incro: i }, function () {
                i++
        })
}, 1000)

setTimeout(function () {
        clearInterval(interval);
        var s = db.createHistoryStream('/users', { reverse: true })
        s.on('data', console.log)
}, 9000)

returns..

Node(key=users/1548159578379, value={ incro: 0 }, seq=1, feed=0))
Node(key=users/1548159579383, value={ incro: 1 }, seq=2, feed=0))
Node(key=users/1548159580385, value={ incro: 2 }, seq=3, feed=0))
Node(key=users/1548159581387, value={ incro: 3 }, seq=4, feed=0))
Node(key=users/1548159582388, value={ incro: 4 }, seq=5, feed=0))
Node(key=users/1548159583389, value={ incro: 5 }, seq=6, feed=0))
Node(key=users/1548159584391, value={ incro: 6 }, seq=7, feed=0))
Node(key=users/1548159585392, value={ incro: 7 }, seq=8, feed=0))
saurabhabh commented 5 years ago

Hi @m-onz, Thanks for getting back. From the documentation it seems that createHistoryStream returns all the nodes historically. It doesnt seem to work with prefix. I found createKeyHistoryStream but again this does not support the reverse option. My whole purpose of getting in this is that I dont want to read the entire data at one go. And also want it sorted by timestamp desc. Suppose a use case of chats where the messages are more than 1000 between 2 users. Now I want to load only the last 100 added chat messages. How can I do it efficiently. ?

Is there some efficient option where I get a sorted list (desc by timestamp) and only the last 100 messages ? So that it decreases the read load on the db. ?

m-onz commented 5 years ago

createHistoryStream returns all the historical nodes... as a readable stream so you can reverse it and take the first 100 items without downloading the entire dataset. Your right about the prefix unfortunately unless i'm mistaken.

See my link to the streams hand book (specifically through/transform streams)... you just need to push null onto the stream within a transform to end the stream gracefully (no need for stream.destroy())

https://github.com/substack/stream-handbook#transform

Here is an example of what I mean using through2 https://www.npmjs.com/package/through2

var i = 0
db.createHistoryStream({ reverse: true })
  .pipe(through2(function (chunk, enc, callback) {
    if ( ++i < 100 ) this.push(chunk)
       else this.push(null)
    callback()
   }))
  .pipe(console.log)

Maybe someone else can chime in with some suggestions and advice too.

saurabhabh commented 5 years ago

I store /users/1 and also /message/234 the createHistoryStream not only return the users node but also the message node. Is my understanding correct? I need only the users not the messages. That is where I need the 'prefix' I think.

m-onz commented 5 years ago

yes, ultimately it would be good to have the read stream from createReadStream('/users') sorted via time and then you can avoid having to do your own prefixing. I'm hoping that others have thoughts on this I tend to take the approach of adding my own timestamps and sorting using array sort. I get individual items with /user/ or I work with the entire set and filter. Sorry i couldn't be of more help!

One thing to try which I haven't is to use the greater than or less than options in the stream, those might be useful for limiting the number of entries your dealing with.

saurabhabh commented 5 years ago

Thanks for the effort @m-onz . Hopefully some one will give a feed back. I will also try looking at how the 'reverse' is implemented and what parameter it actually uses for reversing .

m-onz commented 5 years ago

@saurabhabh See this: https://github.com/mafintosh/hyperdb/issues/143 ... The latest version does allow prefixing for the history stream.. this is the ideal solution for you!

db.createHistoryStream('/winning', { reverse: true })
saurabhabh commented 5 years ago

Thanks. Will give it a try.

saurabhabh commented 5 years ago

Just to update @m-onz I am on the latest version of lib i.e. 3.5.0 and the history stream does not work with prefixing. If we give some prefix, it still returns all the nodes historically sorted.

m-onz commented 5 years ago

I misread the issue that I linked too....

from @e-e-e

In the latest release you can now use createKeyHistoryStream(key) to get the history of a specific key. If you want to get the history a prefix recursively - this is not to difficult to achieve by combining get keyHistory and createReadStream.

You can use createKeyHistoryStream in the latest version to get the keys your interested in. This has been a bit of an adventure! I can make a code example for this if you need me too.

Hopefully this info is useful to others!

saurabhabh commented 5 years ago

Hello, I have implemented it for the time being as follows : I store users with timestamp as keys

users/{timestamp1}
users/{timestamp2}
....

I store the time stamps keys in an array

users/keys/
{
  timestamp1 ,
  timestamp2
}

I sort the keys by timestamp and then get the required users. Let me know if you have a better approach.