quantum / esos

An open source, high performance, block-level storage platform.
http://www.esos-project.com/
Other
287 stars 58 forks source link

vdisk_fileio can silently discard writes #258

Closed Brain2000 closed 4 years ago

Brain2000 commented 4 years ago

I created a volume with write-journaling, instead of the bitmap, and found that it works great. Until I made the journal drive fail. All writes cease, even though the drive unit appears to still accept all writes. Like it is a perf device. Everything written from that point forward just goes to a black hole.

I have the drive configured as an NV_Cache device in the /etc/scst.conf.

I'm kind of surprised mdadm doesn't switch the device over from journaling to resync automatically. But since it doesn't, I'm trying to figure out if the data black hole issue is with mdadm or with scst.

Any ideas or direction I could look here?

Brain2000 commented 4 years ago

I found out the issue is with the scst vdisk_fileio handler.

If I choose the vdisk_blockio, the disk immediately stops allowing writes until the consistency_policy is changed to resync from journaling. But the vdisk_fileio just discards the data without any indicator. Wow!?

Brain2000 commented 4 years ago

I discovered that this also happens if the underlying volume is readonly, such as:

mdadm --readonly /dev/mdxxx

This works correctly in the vdisk_blockio, but not with the vdisk_fileio. It happily accepts any and all bytes being written as if the volume is online, as they are silently discarded.

I'm going to try the "bleeding edge" version to see if the newer SCST providers have fixed this issue. The current "stable" version uses SCST 3.3, which seems anything but stable.

msmith626 commented 4 years ago

Hi,

Using 'vdisk_fileio' everything goes in/out of the Linux page cache. I suspect what you're seeing is data going into the page cache, and the underlying block device fails (or goes read-only, or whatever) and your data is still sitting in the page cache (you'll probably see messages about failed buffered writes in the kernel logs).

I typically use 'vdisk_blockio' so I don't have a lot of guidance on the page cache and the specifics of SCST with this device handler.

Does it make any difference if you don't use "nv_cache=1" on your SCST devices?

I'd encourage you to reach out to the community on the scst-devel mailing list. I believe the 'vdisk_fileio' device handler is still quite popular, so I don't think this is some catastrophic bug, rather something specific about your configuration that is causing it to eat data.

--Marc

Brain2000 commented 4 years ago

I initially thought that was a possibility, so I tried copying a 48GB file, where there is only 16GB installed on the system. The esos version I am using boots from a 4GB USB and is completely memory resident, so it I do not believe it is using the USB stick as swap space.

I did try both nv_cache=1 and nv_cache=0, and it did not make any difference.

The vdisk_blockio device immediately paused if the underlying volume became readonly or if the write_journal volume was manually failed, so that is working properly.

I am also testing using a combination of the vdisk_blockio and lvcache with a ramdisk, and that seems to be working properly.

I will check out the scst-devel mailing list.