mafintosh / hyperdb

Distributed scalable database
MIT License
753 stars 75 forks source link

createReadStream() #35

Closed e-e-e closed 6 years ago

e-e-e commented 6 years ago

Hey @mafintosh @noffle,

I started playing around with building a readable stream interface for hyperdb. I started this as createDiffStream is scaling badly, and wanted to see if I could get a super lean stream working.

This is really rough, and not alt all ready for merging (still using => funcs and riddled with console logs), but I figured I should open a PR to get your guys input. I have been using @noffle's architecture description as a basis, and reading your code, but still might be doing something totally stupid.

Any thoughts would be appreciated. Its quick and respects back pressure so hopefully will scale well.

One bug that I cant seem to figure out so far is illustrated in the first test - although there are multiple 'bar/cat' keys, it's only returning the latest - while for all other keys (for examplefoo/) the duplicates are returned. UPDATE: I changed the implementation of read stream so that it only returns the latest entries.

e-e-e commented 6 years ago

also - happy new years 🍾 hope your taking a break and relaxing.

e-e-e commented 6 years ago

@mafintosh @noffle I think I have got this to a point where this is working well. Would you mind having a look and seeing if there is any thing obviously dumb or if I have really tackled this from the wrong angle completely.

There may be edge cases around conflicting nodes which I have not thought through.

I also have to write some nicer tests - the tests at the moment are just what I was playing around with to test things for myself.

Let me know your thoughts.

e-e-e commented 6 years ago

@noffle and @mafintosh I think this is ready now - if you want to take a look. I just experimented with this implementation of createReadStream on an instance of hyper-graph-db with 200k+ entries and got search times down from 30-40s to 1-3s. Its really quick.

hackergrrl commented 6 years ago

Awesome work @e-e-e! I've been following your progress somewhat, though I'm not sure when I'll have time to go over it in detail.

e-e-e commented 6 years ago

I noticed that the node queue sort was a performance bottle neck - so just implemented a conditional check meaning it only sorts when the next node is less than the last node in the queue. Strangely though - it looks like now the queue does not actually need to be sorted at all. The next algorithm always returns them in order seemingly - I am skeptical that this is always the case so have left the sort as part of the stream - but perhaps you will have a better idea @mafintosh, and maybe we dont need it at all.

mafintosh commented 6 years ago

We merged next :)