sirixdb / sirix

SirixDB is an an embeddable, bitemporal, append-only database system and event store, storing immutable lightweight snapshots. It keeps the full history of each resource. Every commit stores a space-efficient snapshot through structural sharing. It is log-structured and never overwrites data. SirixDB uses a novel page-level versioning approach.
https://sirix.io
BSD 3-Clause "New" or "Revised" License
1.09k stars 244 forks source link

Read-only secondary indexes should get a tailored in-memory data structure #624

Open JohannesLichtenberger opened 1 year ago

JohannesLichtenberger commented 1 year ago

Currently, we're reading the red-black tree nodes from the disk on the first load and putting the nodes into a global buffer manager/cache. Still, we could, for instance, also use the adaptive radix tree or an even better data structure for read-only...

anmol797 commented 2 weeks ago

Still open ?

JohannesLichtenberger commented 2 weeks ago

Yes

JohannesLichtenberger commented 2 weeks ago

@anmol797 we should probably chat about this

anmol797 commented 2 weeks ago

Yes

can i take it up ? i am new to open source contribution

anmol797 commented 2 weeks ago

@anmol797 we should probably chat about this

Sure

JohannesLichtenberger commented 2 weeks ago

Are you familiar with basic data structures, especially functional/persistent structures? I'd like to discuss this first with someone already familiar to these topics :-)

anmol797 commented 2 weeks ago

Are you familiar with basic data structures, especially functional/persistent structures? I'd like to discuss this first with someone already familiar to these topics :-)

yes i am aware of all these i am a fresher working professional in IT sector

JohannesLichtenberger commented 2 weeks ago

Ok, so basically the main datastructure in Sirix is a "keyed", persistent trie to fetch pages (full pages) or page fragments (e.g. only changed records plus records which fall out of a sliding window). The data is stored in the leaf pages of the tries. Currently, we store JSON nodes or XML nodes in a trie. Secondary indexes based on Red/Black balanced binary trees are currently stored in other tries.

Now, as red/black trees are not cache-line friendly (and hopefully Valhalla bears some fruits better sooner than later) we could alternatively store the secondary indexes as Adaptive Radix trees (or the newer variant Height Optimized trees). That said to incorporate also the sliding snapshot algorithm used to version the current leaf pages is a lot of work...

I'm currently not sure if the rotations due to balancing a binary tree lead to a lot of copied tree nodes (which are currently stored in the trie leaf pages). Instead of implementing a persistent ART with also versioning the leaf pages it would of course be much simpler to simply read the stored red black tree nodes into an ART, but not sure if that really would make sense.

On the other hand it's currently "nice to have", but a lot of work (maybe half a year or maybe even much more as it's done in spare time).

Another pressing issue is, why a full scan of a bigger resource (3,8Gb JSON file stored in Sirix) in parallel traversed by N read-only trxs is much slower than with only one trx (it's not at all obvious for me currently when profiling what's the issue).

JohannesLichtenberger commented 2 weeks ago

https://sirix.io/docs/concepts

anmol797 commented 2 weeks ago

Ok, so basically the main datastructure in Sirix is a "keyed", persistent trie to fetch pages (full pages) or page fragments (e.g. only changed records plus records which fall out of a sliding window). The data is stored in the leaf pages of the tries. Currently, we store JSON nodes or XML nodes in a trie. Secondary indexes based on Red/Black balanced binary trees are currently stored in other tries.

Now, as red/black trees are not cache-line friendly (and hopefully Valhalla bears some fruits better sooner than later) we could alternatively store the secondary indexes as Adaptive Radix trees (or the newer variant Height Optimized trees). That said to incorporate also the sliding snapshot algorithm used to version the current leaf pages is a lot of work...

I'm currently not sure if the rotations due to balancing a binary tree lead to a lot of copied tree nodes (which are currently stored in the trie leaf pages). Instead of implementing a persistent ART with also versioning the leaf pages it would of course be much simpler to simply read the stored red black tree nodes into an ART, but not sure if that really would make sense.

On the other hand it's currently "nice to have", but a lot of work (maybe half a year or maybe even much more as it's done in spare time).

Another pressing issue is, why a full scan of a bigger resource (3,8Gb JSON file stored in Sirix) in parallel traversed by N read-only trxs is much slower than with only one trx (it's not at all obvious for me currently when profiling what's the issue).

okay okay , understood , so basically the optimization of Read need to be done ""

" Instead of implementing a persistent ART with also versioning the leaf pages it would of course be much simpler to simply read the stored red black tree nodes into an ART, but not sure if that really would make sense." can you please explain a bit more ? and this is the approach you want me to use ? "Now, as red/black trees are not cache-line friendly (and hopefully Valhalla bears some fruits better sooner than later) we could alternatively store the secondary indexes as Adaptive Radix trees"

JohannesLichtenberger commented 2 weeks ago

Point is it's currently more of a "nice to have" thing with a huge possibility that work is not going to be finished and it's more like "this could be better" ;-)

However, I think what would be valuable in any case would be to separate the PageReadOnlyTrx and the PageTrx from the current trie implementation. IMHO these should be the StorageEngineReader,StorageEngineWriter with separate KeyedTrieReader,KeyedTrieWriter classes. Maybe you'd like to start with this "subtask" to get familiar with the code base?

JohannesLichtenberger commented 2 weeks ago

TreeModifierImpl and the interface would probably be the KeyedTrieWriter.

JohannesLichtenberger commented 2 weeks ago

Oh and BTW: we have to fix this first as currently the CI build always fails due to an old docker image format used by Keycloak 7.0.1, which is of course super old and should be updated: https://github.com/sirixdb/sirix/issues/711

anmol797 commented 1 week ago

Point is it's currently more of a "nice to have" thing with a huge possibility that work is not going to be finished and it's more like "this could be better" ;-)

However, I think what would be valuable in any case would be to separate the PageReadOnlyTrx and the PageTrx from the current trie implementation. IMHO these should be the StorageEngineReader,StorageEngineWriter with separate KeyedTrieReader,KeyedTrieWriter classes. Maybe you'd like to start with this "subtask" to get familiar with the code base?

yeah sure , that will be more effective and helpful for me

please let me know what to start with

anmol797 commented 1 week ago

Oh and BTW: we have to fix this first as currently the CI build always fails due to an old docker image format used by Keycloak 7.0.1, which is of course super old and should be updated: https://github.com/sirixdb/sirix/issues/711

okay i can help with that too

JohannesLichtenberger commented 1 week ago

You could start with the Keycloak issue and then with the proposed refactoring?

anmol797 commented 1 week ago

You could start with the Keycloak issue and then with the proposed refactoring?

ok , please tell me how to start with that

JohannesLichtenberger commented 1 week ago

You have to update the docker files and check that it works again, as startup scripts to import the realm are not supported anymore

anmol797 commented 1 week ago

You have to update the docker files and check that it works again, as startup scripts to import the realm are not supported anymore

how to test that change ? any reference ?