Open abbradar opened 3 months ago
It seems similar to one previous issue, it could be caused by concurrent chunk commit that lead to file length changes. Would it be normal reading again after a while?
No, the file that was corrupted in this way stays the same. I just checked the one that I tested this on. The flow has been:
hexdump
) detects a zero region and stops;pv
) crashes with EIO (but I think I saw instances of this bug happening without the writing side crashing);@abbradar This is not a bug. JuiceFS provides close-to-open consistency, once a file is written and closed, it is guaranteed to view the written data in the following opens and reads from any client.
Additionally, if the client write failed due to object storage issues, the behavior at the point of write failure is undefined, the file may contain zero region.
Do you mean that if the object storage itself exhibits this behavior and may leave corrupted zero-filled objects, then this propagates to the FS level? I would have expected the region to be random garbage or for a read error to happen then because we use JuiceFS encryption on top; please correct me if I'm wrong.
BTW, does JuiceFS use the Content-MD5
header during uploads for S3-compatible storage?
EDIT:
JuiceFS provides close-to-open consistency, once a file is written and closed, it is guaranteed to view the written data in the following opens and reads from any client.
Sorry I didn't ask this at first, but just to be clear — do these inconsistencies only include synchronization (say, if we update different parts of the file, the order may not be preserved, or the file might not be updated at all until closed), or may lead to the client reading completely invalid data (zeroes or other garbage)?
@jiefenghuang I just managed to reproduce the other behavior you mentioned. The setup was the same (one writing and another reading host), but I wrote a small script for the reading side to detect zero regions:
import time
import sys
pat = b"\0"
f = open("/mnt/jfs-partition/test-file", "rb")
pos = 0
while True:
chunk = f.read(2048)
if len(chunk) == 0:
time.sleep(1)
elif chunk.count(pat) == len(chunk):
print(f"Pos {pos}: Zeroes chunk detected! Size: {len(chunk)}", flush=True)
pos += len(chunk)
On the writing side, I ran pv -L 3k /dev/urandom > /mnt/jfs-partition/test-file
again. I also ran juicefs info /mnt/jfs-partition/test-file
in a loop at the reading side. Eventually I got:
Pos 38912: Zeroes chunk detected! Size: 2048
Pos 40960: Zeroes chunk detected! Size: 2048
Pos 43008: Zeroes chunk detected! Size: 2048
Pos 45056: Zeroes chunk detected! Size: 2048
Pos 47104: Zeroes chunk detected! Size: 2048
Pos 49152: Zeroes chunk detected! Size: 2048
Pos 51200: Zeroes chunk detected! Size: 1331
And:
/mnt/jfs-partition/test-file :
inode: 2439006
files: 1
dirs: 0
length: 51.30 KiB (52531 Bytes)
size: 52.00 KiB (53248 Bytes)
path: /test-file
objects:
+------------+--------------------------------------------+-------+--------+--------+
| chunkIndex | objectName | size | offset | length |
+------------+--------------------------------------------+-------+--------+--------+
| 0 | jfs-partition/chunks/20/20038/20038936_0_21196 | 21196 | 0 | 21196 |
| 0 | jfs-partition/chunks/20/20834/20834060_0_15668 | 15668 | 0 | 15668 |
| 0 | jfs-partition/chunks/20/20834/20834062_0_15667 | 15667 | 0 | 15667 |
+------------+--------------------------------------------+-------+--------+--------+
Notice that zeroes begin at the chunk border (21196 + 15668 == 38912), and indeed, stretch the entire chunk.
But when I reopened the file later, no zero regions were there!
Can you point me to the known issue?
@jiefenghuang I just managed to reproduce the other behavior you mentioned. The setup was the same (one writing and another reading host), but I wrote a small script for the reading side to detect zero regions:
import time import sys pat = b"\0" f = open("/mnt/jfs-partition/test-file", "rb") pos = 0 while True: chunk = f.read(2048) if len(chunk) == 0: time.sleep(1) elif chunk.count(pat) == len(chunk): print(f"Pos {pos}: Zeroes chunk detected! Size: {len(chunk)}", flush=True) pos += len(chunk)
On the writing side, I ran
pv -L 3k /dev/urandom > /mnt/jfs-partition/test-file
again. I also ranjuicefs info /mnt/jfs-partition/test-file
in a loop at the reading side. Eventually I got:Pos 38912: Zeroes chunk detected! Size: 2048 Pos 40960: Zeroes chunk detected! Size: 2048 Pos 43008: Zeroes chunk detected! Size: 2048 Pos 45056: Zeroes chunk detected! Size: 2048 Pos 47104: Zeroes chunk detected! Size: 2048 Pos 49152: Zeroes chunk detected! Size: 2048 Pos 51200: Zeroes chunk detected! Size: 1331
And:
/mnt/jfs-partition/test-file : inode: 2439006 files: 1 dirs: 0 length: 51.30 KiB (52531 Bytes) size: 52.00 KiB (53248 Bytes) path: /test-file objects: +------------+--------------------------------------------+-------+--------+--------+ | chunkIndex | objectName | size | offset | length | +------------+--------------------------------------------+-------+--------+--------+ | 0 | jfs-partition/chunks/20/20038/20038936_0_21196 | 21196 | 0 | 21196 | | 0 | jfs-partition/chunks/20/20834/20834060_0_15668 | 15668 | 0 | 15668 | | 0 | jfs-partition/chunks/20/20834/20834062_0_15667 | 15667 | 0 | 15667 | +------------+--------------------------------------------+-------+--------+--------+
Notice that zeroes begin at the chunk border (21196 + 15668 == 38912), and indeed, stretch the entire chunk.
But when I reopened the file later, no zero regions were there!
Can you point me to the known issue?
It is a question asked by a user in a WeCom user group, and has not been posted on GitHub Issue.
@abbradar This is not a bug. JuiceFS provides close-to-open consistency, once a file is written and closed, it is guaranteed to view the written data in the following opens and reads from any client.
- If one client is writing while another client is reading simultaneously, inconsistencies may occur. You should expect to obtain valid data through "read-after-flush", not "read-after-write".
- If the client write failed due to object storage (or other) issues, this part of file is meaningless. You should not expect to read valid data from it.
I've made some modifications, it may have resolved your question.
@jiefenghuang Thanks; I have several questions about these.
For (1), with no write failures, does this property mean:
close()
;We use JuiceFS for writing append-only binary logs, and read them sequentially on another host in real time. We are not concerned with not having a time guarantee, but the possibility of reading invalid data breaks this pattern.
(2) is a problem for us. I would have expected JuiceFS to behave like a checksummed+journaled file system here—i.e., exclude the possibility of corrupted data, so the worst case becomes either "reading gives you EIO" or "write happened partially/did not happen". Can you clarify a bit more on how this works internally — what sequence of metadata+object operations can result in reading zero chunks, and what happens later when they get magically fixed?
We could improve this case in next 1.3 release
Thanks! We have lots of use cases for "one host appends, another reads", so we are motivated to help with this.
What happened: When writing a file on a host, sometimes a zero-filled region may be written instead of the actual data.
What you expected to happen: No data corruption.
How to reproduce it (as minimally and precisely as possible):
On one host:
pv -L 3k /dev/urandom > /mnt/jfs-partition/test-file
On another host:while ! hexdump -C /mnt/jfs-partition/test-file | grep '00 00 00 00'; do sleep 1; done
After some time, the command on the second host will find a large cluster of zeroes (>1k) and stop. In the file, you see, for instance:
Anything else we need to know?
More details:
Unfortunately, we have many irrelevant logs on the writing server because this is a production host. These ones may be relevant (the file inode is 2439005):
What else do we plan to try:
Environment:
juicefs --version
) or Hadoop Java SDK version:juicefs version 1.2.0+2024-06-18.873c47b922ba
(both hosts)cat /etc/os-release
): NixOS 24.11 (Vicuna) (both hosts)uname -a
):6.1.90
(both hosts)