Open samirvb opened 2 years ago
/node/data//sofajraft/stacs/snapshot/temp
exists and is not a directory?
/node/data//sofajraft/stacs/snapshot/temp
exists and is not a directory?
Yes , this location doesn't exist nor as a directory or as a file.
Can you show the ls -lsh
result for: /stacs/snapshot/
Unfortunately I don't have the old node (since we had to restore it). I had done a "ls -la" on the location and found no other "temp" folder/file in that location. Attached is a screenshot of the restored node -
Is there anyway we can reproduce this issue ? This is quite important and our cluster goes down so we need to fix it.
Most likely it was permission issue, but the logger did not print the exception message, I fixed the log in this #708
I think it's a permission problem here , what's the user do you run the java program? In above screenshot, the snapshot directory belongs to root user.
I think it's a permission problem here , what's the user do you run the java program? In above screenshot, the snapshot directory belongs to root user.
Hi , all processes are run using the root user. See below screenshot :
The process runs using the "root" user I was able to create a directory in the same location using the mkdir command and was able to create it.
Can you let me know if there is any way we can reproduce the creation of snapshot (and hopefully this issue) ?
Only one directory was created and nothing else was done, so we couldn't find a good way to reproduce it.
We will release a new version with more logs, and if it reproduces in future, we can find out the root cause.
Your question
On one of my existing nodes , the snapshot creation fails with the following exception stacktrace :
2021-11-05 17:56:29.200 [ ] [JRaft-Closure-Executor-4] [init-64] ERROR c.a.s.j.s.s.l.LocalSnapshotWriter - Fail to create directory /node/data//sofajraft/stacs/snapshot/temp. 2021-11-05 17:56:29.201 [ ] [JRaft-Closure-Executor-4] [create-285] ERROR c.a.s.j.s.s.l.LocalSnapshotStorage - Fail to init snapshot writer. 2021-11-05 17:56:29.202 [ ] [JRaft-FSMCaller-Disruptor-0] [onError-72] ERROR c.a.s.j.c.StateMachineAdapter - Encountered an error=Status[EIO<1014>: Fail to create snapshot writer.] on StateMachine io.stacs.nav.consensus.sofajraft.config.SofajraftStateMachine, it's highly recommended to implement this method as raft stops working since some error occurs, you should figure out the cause and repair or remove this node. com.alipay.sofa.jraft.error.RaftException: ERROR_TYPE_SNAPSHOT at com.alipay.sofa.jraft.storage.snapshot.SnapshotExecutorImpl.reportError(SnapshotExecutorImpl.java:691) at com.alipay.sofa.jraft.storage.snapshot.SnapshotExecutorImpl.doSnapshot(SnapshotExecutorImpl.java:346) at com.alipay.sofa.jraft.core.NodeImpl.doSnapshot(NodeImpl.java:3098) at com.alipay.sofa.jraft.core.NodeImpl.lambda$handleSnapshotTimeout$0(NodeImpl.java:607) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)
Note that the location already has snapshot folders so it's not an issue with permissions. Also there is no issue with disk space. Any idea what might be happening ? This error occurs on different nodes which have been running fine using sofajraft 1.3.5.
Your scenes
Describe your use scenes (why need this feature)
Your advice
Describe the advice or solution you'd like
Environment
java -version
): openjdk version "11.0.10" 2021-01-19uname -a
): Linux native-e-64d59994b-dgtk5 5.4.149-73.259.amzn2.x86_64 #1 SMP Mon Sep 27 12:48:12 UTC 2021 x86_64 Linux