sirixdb / sirix

SirixDB is an an embeddable, bitemporal, append-only database system and event store, storing immutable lightweight snapshots. It keeps the full history of each resource. Every commit stores a space-efficient snapshot through structural sharing. It is log-structured and never overwrites data. SirixDB uses a novel page-level versioning approach.
https://sirix.io
BSD 3-Clause "New" or "Revised" License
1.12k stars 253 forks source link

String compression #556

Open JohannesLichtenberger opened 1 year ago

JohannesLichtenberger commented 1 year ago

FSST: Fast Random Access String Compression

JohannesLichtenberger commented 1 year ago

Useable either as a replacement of the current dictionary encoding (used to store object field names or element names/attribute names) or simply for the text compression of string values.

AlvinKuruvilla commented 1 year ago

I feel the best way to implement this is to keep this as a separate library so we can maintain it separately and add it as a dependency. I also found the original c++ code used in the paper

AlvinKuruvilla commented 1 year ago

This is the current implementation I have... if we want, we can migrate this to the Sirix organization and continue development from

JohannesLichtenberger commented 1 year ago

I think you could simply develop it as a separate library and I can add it as a dependency once it's finished and published to maven central :-)

AlvinKuruvilla commented 1 year ago

I think you could simply develop it as a separate library and I can add it as a dependency once it's finished and published to maven central :-)

Good point

JohannesLichtenberger commented 1 year ago

@AlvinKuruvilla did you made any advances?

JohannesLichtenberger commented 1 year ago

@AlvinKuruvilla ping :-)

AlvinKuruvilla commented 1 year ago

Sorry @JohannesLichtenberger, not lately. I just finished school. I hope to get some more work done now that I have some free time. I have a feeling this is going to be a longer-term issue, especially with all of the tests that need to be ported over

Aminmalek commented 1 year ago

is this issue fixed? @JohannesLichtenberger

JohannesLichtenberger commented 1 year ago

No, still open. But I think low priority

AlvinKuruvilla commented 1 year ago

Yeah, sorry about that @JohannesLichtenberger , I didn't mean to ghost you like that. I've been busy with school and those projects. I'm still interested in the project just haven't had the time as of late., I have most of the basic building blocks coded up for the most part, and I had some tests written up. Can we consider making this part of the org? It's nowhere near ready or stable, but I'm familiar enough with the codebase to set things up so people can at least look at open issues and consider contributing. That way, we can drive some progress when I can't work on it.

JohannesLichtenberger commented 1 year ago

@Aminmalek wanted to work on this, so you may transfer it to the organization.

Aminmalek commented 1 year ago

@AlvinKuruvilla we can work on this together.