kriszyp / lmdb-js

Simple, efficient, ultra-fast, scalable data store wrapper for LMDB
Other
486 stars 39 forks source link

"Unexpected end of MessagePack data" error after migration to use "Shared Structures" #55

Closed salortiz closed 3 years ago

salortiz commented 3 years ago

In a database using msgpack as encoding, I suspected that it would be enough to add the sharedStructures option with its key and, through an iterator (getRange), a simple "put" with the same key would suffice to compact all records.

The process ran smoothly, but a subsequent access to the database resulted in the aforementioned error and/or mangled key-values.

I'm using node v14.17.0, lmdb-store v1.5.2 and msgpackr v1.3.3 on Fedora 33.

A little script to reproduce the problem with sample data available in https://gist.github.com/salortiz/a4122bae442d02bdeda807ce547d15c4

kriszyp commented 3 years ago

You can't really have different shared structures settings for writing with msgpackr and reading. However, you can make your migration script work by setting a different decoder with the previous setting, without a shared structure:

    conf.sharedStructuresKey = Buffer.from('structs'); // Add structs to conf
    let store = open('mydata', conf)
    store.decoder = new Decoder() // import Decoder from msgpackr package and use it without shared structures

    let iter = store.getRange({
 ... do migration
    store.decoder = store.encoder // if you want to restore the decoder that uses that shared structure after migration
salortiz commented 3 years ago

@kriszyp I don't need to have different shared structures (SS) for reading and writing. During the migration and after it I will use one and only one. The records created originally (without SS) where writing as simple entries (without the SS related extensions), so I expected that, during the migration, when the iterator reads (decode) them, the machinery added by sharedStructuresKey wasn't used, but becomes created and used at the first (and subsequent) store.puts (encode), overwriting the original value with the new encoding (i.e. with the msgpack's SS extension type).

My tests shows that the store.decoder created with the SS active, can read the original values without problems. And all writes will use the same store.encoder, So, I don't understand where is the problem.

I already workaround my problem reading all data into memory (and removing it from the store) in the iterator loop, and then writing all in one go, in a separate loop, but want to report the issue because it violated my expectations and no errors were reported during the migration, the error appears later (in a separate run), using the same sharedStructuresKey, so something becomes corrupted during the migration overwrites.

Thanks for your attention.

kriszyp commented 3 years ago

The issue is that the records were written with shared structures disabled, and in the migration step, they were read with shared structures enabled (which is a different setting). This reason this causes a problem is that with shared structures disabled, the structures are written within each entry/document (they are all assumed to be "private" structures), but when msgpackr starts reading and writing with shared structures enabled, the private structures are read and override the slots of the structures that are supposed shared structures (using the same structure ids), thereby "corrupting" them. It would be possible to add extra checks for this type of thing, but generally msgpackr is written to optimize for performance, and reading an MessagePack document that was written with a different shared structure setting wouldn't really be supported anyway.

salortiz commented 3 years ago

I wrongly assumed that the shared structures would use a different set of tags. Now I have it clear. Thank you very much for the explanation and this great piece of code.