nanovms / nanos

A kernel designed to run one and only one application in a virtualized environment
https://nanos.org
Apache License 2.0
2.65k stars 137 forks source link

occasional failure of fallocate test #1571

Closed wjhun closed 3 years ago

wjhun commented 3 years ago

The fallocate runtime test will, on occasion (say, one out of every dozen runs), exit with the following assertion failure:

Error: buf[i] == 0 -- failed at /home/wjhun/src/nanos/master4/test/runtime/fallocate.c:32

This appears to happen in the test_tmpfile during the loop that validates that all data read after the written first byte appears as zero. These zeros are not written explicitly but rather assumed to be zero after the call to fallocate(). I have only seen this failure when running in qemu under Linux with KVM and under macOS with hvf, and have not been able to reproduce it on either platform using TCG.

After further inspection, it appears that the buffer byte location that throws the assertion contains 0xff, which is the pattern that is first used to fill the buffer. However, if I change the fill pattern to something else (and change the first assertion for the written byte accordingly), the failure mode still shows 0xff bytes. Incorporating a check in sg_zero_fill(), called by tfs.c:zero_hole() and tfs:read_extent(), validates that the pagecache buffer is filled with zeros and that its address and length are correct. Also note that the area filled with the 0xff pattern seems to always begin and end on sector boundaries, though the extent of this area seems to change with each failure.

wjhun commented 3 years ago

I can validate that the invalid data exists in the pagecache buffers prior to being copied to userspace. The earlier statement about the call to sg_zero_fill does not apply to the second page being read, which is where the invalid data is found. It seems that this is related to the process of writing the zero sectors - which occurs when write_extent() calls zero_blocks in the process of converting an uninited extent to an initialized one - and then immediately filling the second page with that zeroed content. Any change (e.g. debug output) that creates a delay between the two steps seems to mask the problem. I would have expected that writing zeros to the sectors followed by immediately reading them back would, while inefficient, pose no issue for data integrity, but I may be missing something here. I tried running the test by setting num_queues for the virtio-scsi driver to 1 (which appears to be the default on my setup, anyway), but am still seeing the failure.

I would be satisfied to find a solution to avoid re-reading zeroed sectors if I could understand the precise cause of this problem...