sofastack / sofa-jraft

A production-grade java implementation of RAFT consensus algorithm.
https://www.sofastack.tech/projects/sofa-jraft/
Apache License 2.0
3.59k stars 1.15k forks source link

Snapshot metadata lost after node panic back. #480

Closed yuyang0423 closed 4 years ago

yuyang0423 commented 4 years ago

Describe the bug

Utils.atomicMoveFile() can't guaranteed data consistency and sync to disk.

sofa-jraft/jraft-core/src/main/java/com/alipay/sofa/jraft/storage/io/ProtoBufFile.java::save() method has issues which can case snapshot metadata lost.

which let syncMeta = true doesn't works as expect;

Risk 1: no output flush after output write. some data in buffer cache;

Risk 2: Like 1, move tmp file to final metadata file looks like a atomic operation but the target file isn't sync to disk.

Expected behavior

If enduser do concerned about data safety, current code cant guaranteed this.

Actual behavior

In our regular test, we see this symptoms: (sorry, hide some path infos)

total 16K drwxr-xr-x. 4 root root 33 Jul 3 08:29 kv -rw-r--r--. 1 root root 16K Jul 2 20:14 kv.zip _-rw-r--r--. 1 root root 0 Jul 2 20:14 __raft_snapshotmeta

_raft_snapshot_meta file size 0, which cause node init failed;

2020-07-02 20:26:15.834 [main] INFO FSMCallerImpl:201 - Starts FSMCaller successfully. 2020-07-02 20:26:15.845 [main] ERROR LocalSnapshotReader:91 - Fail to load snapshot meta $PATH_TO_SNAPSHOT/snapshot/snapshot_1065/__raft_snapshot_meta. 2020-07-02 20:26:15.846 [main] ERROR LocalSnapshotStorage:313 - Fail to init reader for path $PATH_TO_SNAPSHOT/snapshot/snapshot_1065. 2020-07-02 20:26:15.846 [main] ERROR NodeImpl:977 - Node $NODE_INFO is initialized with inconsistent log, status=Status[EIO<1014>: Missing logs in (0, 778)].

Steps to reproduce

There is a unit test can reproduce this issue, if you do some modifications.

sofa-jraft/jraft-core/src/test/java/com/alipay/sofa/jraft/storage/io/ProtobufFileTest.java::testSaveLoad()

  1. don't use /tmp menu
  2. panic server without system sync. after line 42 assertTrue(file.save(msg, true));, after panic server back, you will see this issue with that file.

but if you sync before panic, this won't happen. That is why I say, syncMeta = true doesn't works as expect;

Minimal yet complete reproducer code (or GitHub URL to code)

Environment

killme2008 commented 4 years ago

when sync=true , it will call FileDescriptor#sync in ProtoBufFile#save method:

         if (sync) {
                fOut.getFD().sync();
            }

The problem is that it forgot to call output.flush before calling fd.sync, the written data is in BufferedOutputStream but not wrote into underlying output stream(file system), so in critical situation such as you describe, it may lost the data even setting syncMeta=true.

The Utils.atomicMoveFile use Files.move API, it will keep the consistent between source and target on linux platform, so it's fine.

yuyang0423 commented 4 years ago

when sync=true , it will call FileDescriptor#sync in ProtoBufFile#save method:

         if (sync) {
                fOut.getFD().sync();
            }

The problem is that it forgot to call output.flush before calling fd.sync, the written data is in BufferedOutputStream but not wrote into underlying output stream(file system), so in critical situation such as you describe, it may lost the data even setting syncMeta=true.

The Utils.atomicMoveFile use Files.move API, it will keep the consistent between source and target on linux platform, so it's fine.

Thanks your quick reply, you can try with my steps. Utils.atomicMoveFile may doesn't works as expects.

killme2008 commented 4 years ago

@yuyang0423 We will try it. You can try my fix if you can https://github.com/sofastack/sofa-jraft/pull/481

Thank you very much. Java will take care of Files.move in theory.

https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#move(java.nio.file.Path,%20java.nio.file.Path,%20java.nio.file.CopyOption...)

yuyang0423 commented 4 years ago

@yuyang0423 We will try it. You can try my fix if you can #481

Thank you very much. Java will take care of Files.move in theory.

https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#move(java.nio.file.Path,%20java.nio.file.Path,%20java.nio.file.CopyOption...)

I do agree with output.flush() fix, but really need fully test the move code in this scenario.

flush & sync a temp file is useless, we need guarantee new metadata is saved & sync to disk.

That is why I report this issue immediately.

killme2008 commented 4 years ago

@yuyang0423 If you want a atomic write to a file (all or nothing), write a temp file and do a atomic move is a common pattern.

We will test it with more cases and check the Files.move source code, thank you.

yuyang0423 commented 4 years ago

@yuyang0423 If you want a atomic write to a file (all or nothing), write a temp file and do a atomic move is a common pattern.

We will test it with more cases and check the Files.move source code, thank you.

Yes, I do know what you means. Anyway, hope your news.

yuyang0423 commented 4 years ago

@killme2008 Reproduced many times even add you fix, I still don't think atomic move is a proper way to metadata file.

Files#move(Path source, Path target, CopyOption... options) "atomic move" guaranteed "atomic" operation between source file and target file. What if the final target file still in OS disk cache??

From test results, the target file obviously in OS cache which let snapshot metadata in dangerous situation.

Utils.atomicMoveFile should complete two things:

  1. atomic move;
  2. sync cache to disk if syncMeta=true is set;
killme2008 commented 4 years ago

@yuyang0423 hi, sorry , the fix is not enough, we should call fsync on the containing directory after the atomic move operation. The correct steps should be:

  1. create a new temp file (on the same file system!)
  2. write data to the temp file
  3. fsync() the temp file
  4. rename the temp file to the appropriate name
  5. fsync() the containing directory

I forgot the step 5. I updated the PR with the new fix, thank you.

yuyang0423 commented 4 years ago

@yuyang0423 hi, sorry , the fix is not enough, we should call fsync on the containing directory after the atomic move operation. The correct steps should be:

  1. create a new temp file (on the same file system!)
  2. write data to the temp file
  3. fsync() the temp file
  4. rename the temp file to the appropriate name
  5. fsync() the containing directory

I forgot the step 5. I updated the PR with the new fix, thank you.

Yes, you got me. I also do a simple fix. Can you help review this??

diff --git a/jraft-core/src/main/java/com/alipay/sofa/jraft/storage/io/ProtoBufFile.java b/jraft-core/src/main/java/com/alipay/sofa/jraft/storage/io/ProtoBufF
index 14a6fe2..0ffb2c0 100644
--- a/jraft-core/src/main/java/com/alipay/sofa/jraft/storage/io/ProtoBufFile.java
+++ b/jraft-core/src/main/java/com/alipay/sofa/jraft/storage/io/ProtoBufFile.java
@@ -100,6 +100,7 @@ public class ProtoBufFile {
     public boolean save(final Message msg, final boolean sync) throws IOException {
         // Write message into temp file
         final File file = new File(this.path + ".tmp");
+        boolean moved = false;
         try (final FileOutputStream fOut = new FileOutputStream(file);
                 final BufferedOutputStream output = new BufferedOutputStream(fOut)) {
             final byte[] lenBytes = new byte[4];
@@ -115,11 +116,13 @@ public class ProtoBufFile {
             Bits.putInt(lenBytes, 0, msgLen);
             output.write(lenBytes);
             msg.writeTo(output);
-            if (sync) {
+            output.flush();
+            moved = Utils.atomicMoveFile(file, new File(this.path));
+            if (moved && sync) {
                 fOut.getFD().sync();
             }
         }

-        return Utils.atomicMoveFile(file, new File(this.path));
+        return moved;
     }
 }
killme2008 commented 4 years ago

@yuyang0423 I think it's not a proper fix, it depends on jdk implementation in move and sync. We can't ensure that fOut.getFD().sync() will always work after moving the file on different platform or file system(ext3, ext4, btrfs etc.)

yuyang0423 commented 4 years ago

@yuyang0423 I think it's not a proper fix, it depends on jdk implementation in move and sync. We can't ensure that fOut.getFD().sync() will always work after moving the file on different platform or file system(ext3, ext4, btrfs etc.)

ok, I'll try you latest fix.

yuyang0423 commented 4 years ago

@killme2008 problem solved with your new fix 👍 . You should consider to bump a version merge this fix and let everyone avoid this. 😀

killme2008 commented 4 years ago

@yuyang0423 Yep, it's a fatal issue, we will release a new version ASAP.