priitj / whitedb

WhiteDB memory database
http://whitedb.org/
GNU General Public License v3.0
608 stars 78 forks source link

wg consistency error: string not found in hash during deletion, offset .. #29

Closed dyatlov closed 7 years ago

dyatlov commented 7 years ago

It happened when I was inserting/updating lots of data from 4 threads simultaneously. Every update/insert was wrapped in lock/unlock so this message very worries me that there's some bug in whitedb. Any guidance where to look at?

priitj commented 7 years ago

Hi,

the scenario you describe sounds like the error might not be easily reproducible. Which makes it rather difficult to fix the bug, where ever it is.

Perhaps you can post the relevant part of the code. I'm mostly interested only in the parts that deal with thread control, reading/writing and locking. It might offer some clue about what is going on.

Also, assuming there is something wrong with locking, can you please give your hardware platform and the options whitedb was configured with, if available.

dyatlov commented 7 years ago

I've made my server singe threaded - no change. It's not about locks but more about data I guess. It's hard to send you a snippet since it all is a part of a web application with lots of connections. It fails when I do this:

    void* rec = location_find(uid);
    location_fill_from_json(rec, json_obj, 1); // several wg_set_.._field
    json_object* uj = location_to_json(rec); // several wg_get_field
    const char* s = json_object_to_json_string(uj);
    location_rec_set_json(rec, (char*)s); // wg_set_str_field(location_db, rec, LOCATION_JSON, v);
    json_object_put(uj); // free

If I comment out last 4 lines - all works fine. Could you tell what the error means? As I understand, it means that memory with previous data (before update) is corrupt, right?

dyatlov commented 7 years ago

I could send you the full code in private if its ok for you..

dyatlov commented 7 years ago

Also if it helps, the same text was written in the same field several times and it lead to fail:

2017-08-18 19:44:19 1 [29042:0x7fc20afad700]: { "id": 740, "place": "Здание", "country": "Швейцария", "city": "Мосинск", "distance": 23 }
2017-08-18 19:44:29 1 [29042:0x7fc20afad700]: { "id": 740, "place": "Здание", "country": "Швейцария", "city": "Мосинск", "distance": 23 }
2017-08-18 19:44:39 1 [29042:0x7fc20afad700]: { "id": 740, "place": "Здание", "country": "Швейцария", "city": "Мосинск", "distance": 23 }

Those are not the only requests, there are lots of others but it always fails in the same place, on this record.

priitj commented 7 years ago

Thanks, I'll check if I can reproduce the error based on this information.

Edit: yes, the error suggests that the memory is corrupt. Basically it goes to free the data that is being deleted but the data is not there anymore.

priitj commented 7 years ago

Unfortunately I could not reproduce the error.

Please send me the full code at priit 'at' whitedb.org and I'll take a quick look.

dyatlov commented 7 years ago

@priitj Managed to make a small PoC. Always reproducible. Have a look pls.

whitedb-test.zip

priitj commented 7 years ago

Well, this was very helpful, thank you.

I can confirm now that this was a whitedb bug. I've commited a fix.

Since this has been present for a long time and affects core functionality, I'll investigate further when I have more time.

dyatlov commented 7 years ago

@priitj cool, just tested and it works now :) I have another issue with the db.. with indexes, they slow it down 1000 times.. but its a theme for another ticket.

dyatlov commented 7 years ago

@priitj btw, the fix is a workaround or a real fix? It looks like a fix of a specific condition..

priitj commented 7 years ago

I think it's fixed permanently now but I haven't done enough testing to be completely satisfied yet. I hope that answers your question :)