Closed task3r closed 2 years ago
Thanks for reporting the problem. This is the type of issue that's hard to check for with pmreorder and similar tools. It'd be too time-consuming ;)
At first glance, it seems the problem is related to large transactions that need to dynamically allocate extra undo log space - which isn't correctly initialized after a crash for some reason.
@task3r we've fixed the issue - can you confirm and close the issue?
I'd also be curious to know how you've discovered the problem - manually or are you working on some new tool? :-)
@pbalcer unfortunately it appears that the bug is still present. I tested branch stable-12 using the instructions shown above and the results are the same: the well-timed crash still leads to an assertion failure in the subsequent execution. Please let me know if I can help with something.
As for the second question, we detected the bug using a new tool we are developing. We cannot share it at the moment as it is under submission but we'll do it as soon as possible.
I just ran it on both the old and new versions of stable-1.12, and I cannot reproduce the problem after the patch. Can you double check you are testing against the latest commit?
My bad, you are totally right. I ran the experiments using containers and it used the build cache instead of fetching the latest updates to the branch... I can confirm that the issue is resolved and I'll close the issue. Sorry for the misunderstanding.
No worries. Please reach out once you can share your research with us ;)
Thanks.
Sorry for the delayed response, @pbalcer, but I forgot about your request. The research has been published in EuroSys'23: "Mumak: Efficient and Black-Box Bug Detection for Persistent Memory". Thank you again for your help and interest.
ISSUE: Crash-consistency bug within
pmemobj_tx_commit
Environment Information
Please provide a reproduction of the bug:
Minimal working example main.c (based on data_store, assumes PM is mounted on
/mnt/pmem0
):To compile it (assuming
PMDK_ROOT
points to the root of the repo):To launch gdb:
Inside gdb:
Output:
How often bug is revealed: rare
I only managed to expose this bug using the btree backend and with
n>2052
. However, in these conditions, it happens every time. This does not seem to be caused by btree, since the bug did not manifest itself in PMDK 1.8 and there have not been significant changes to btree since then.Actual behavior:
The application crashes in a subsequent run by failing an assertion.
Expected behavior:
The application does not crash in a subsequent run.
Additional information about Priority and Help Requested:
Are you willing to submit a pull request with a proposed change? No
Requested priority: Medium