Closed dyatlov closed 7 years ago
Hi,
the scenario you describe sounds like the error might not be easily reproducible. Which makes it rather difficult to fix the bug, where ever it is.
Perhaps you can post the relevant part of the code. I'm mostly interested only in the parts that deal with thread control, reading/writing and locking. It might offer some clue about what is going on.
Also, assuming there is something wrong with locking, can you please give your hardware platform and the options whitedb was configured with, if available.
I've made my server singe threaded - no change. It's not about locks but more about data I guess. It's hard to send you a snippet since it all is a part of a web application with lots of connections. It fails when I do this:
void* rec = location_find(uid);
location_fill_from_json(rec, json_obj, 1); // several wg_set_.._field
json_object* uj = location_to_json(rec); // several wg_get_field
const char* s = json_object_to_json_string(uj);
location_rec_set_json(rec, (char*)s); // wg_set_str_field(location_db, rec, LOCATION_JSON, v);
json_object_put(uj); // free
If I comment out last 4 lines - all works fine. Could you tell what the error means? As I understand, it means that memory with previous data (before update) is corrupt, right?
I could send you the full code in private if its ok for you..
Also if it helps, the same text was written in the same field several times and it lead to fail:
2017-08-18 19:44:19 1 [29042:0x7fc20afad700]: { "id": 740, "place": "Здание", "country": "Швейцария", "city": "Мосинск", "distance": 23 }
2017-08-18 19:44:29 1 [29042:0x7fc20afad700]: { "id": 740, "place": "Здание", "country": "Швейцария", "city": "Мосинск", "distance": 23 }
2017-08-18 19:44:39 1 [29042:0x7fc20afad700]: { "id": 740, "place": "Здание", "country": "Швейцария", "city": "Мосинск", "distance": 23 }
Those are not the only requests, there are lots of others but it always fails in the same place, on this record.
Thanks, I'll check if I can reproduce the error based on this information.
Edit: yes, the error suggests that the memory is corrupt. Basically it goes to free the data that is being deleted but the data is not there anymore.
Unfortunately I could not reproduce the error.
Please send me the full code at priit 'at' whitedb.org and I'll take a quick look.
@priitj Managed to make a small PoC. Always reproducible. Have a look pls.
Well, this was very helpful, thank you.
I can confirm now that this was a whitedb bug. I've commited a fix.
Since this has been present for a long time and affects core functionality, I'll investigate further when I have more time.
@priitj cool, just tested and it works now :) I have another issue with the db.. with indexes, they slow it down 1000 times.. but its a theme for another ticket.
@priitj btw, the fix is a workaround or a real fix? It looks like a fix of a specific condition..
I think it's fixed permanently now but I haven't done enough testing to be completely satisfied yet. I hope that answers your question :)
It happened when I was inserting/updating lots of data from 4 threads simultaneously. Every update/insert was wrapped in lock/unlock so this message very worries me that there's some bug in whitedb. Any guidance where to look at?