Closed vncoelho closed 2 days ago
Need more information.
Need more information.
description updated
Seems that the data is corrupted, it's a fresh installation?
fresh with master
Seems that the data is corrupted, it's a fresh installation?
probably due to the unhanded exception management feature, but still did not investigate further. It is easy to reproduce. Just run a node.
Is it because the 4 nodes are using the same directory for leveldb?
Based off the source code from you error, It look like this Your database is corrupt. try deleting it to see if the problem goes away.
Has to do with Seeking
with KeyComparator
source code says
// User key has become shorter physically, but larger logically.
// Tack on the earliest possible number to the shortened user key.
Based off the source code from you error, It look like this Your database is corrupt. try deleting it to see if the problem goes away.
Has to do with
Seeking
withKeyComparator
source code says// User key has become shorter physically, but larger logically. // Tack on the earliest possible number to the shortened user key.
No @cschuchardt88 , it is a recent introduced problem.
its because you run too many nodes in the same machine that all use leveldb. Not a core problem. This happens every time when you run multiple nodes in the same machine.
its because you run too many nodes in the same machine that all use leveldb. Not a core problem. This happens every time when you run multiple nodes in the same machine.
No. This is not true in my Setup.
Too much complaints and not a real investigation in a simple scenario. The cause is that we now crash the clients with unhandled exception.
Without minimum tests the neo-cli will be unused until we implement the exception handle and find the BASIC problems.
Too much complaints and not a real investigation in a simple scenario.
You can say this when you locate the real problem.
We have being working like this for many years, and all of a sudden its all wrong, we all become complainers? And our work are lack of investigation products? But we definitely have tested it, checked it everywhere, and for this one, i have run the node~~~~ And i have asked help from NGD to test it as well.
But code were there, pr were there, you were able to test, to review, to comment. We have followed your suggestion to leave it for a while to review. Actually that pr was there for a week before i collected sufficient review approvals.
Before we release any new version, we still can correct any problem, so chill. A team means even some one made some problem, some one else can correct it, isn't it?
The cause is that we now crash the clients with unhandled exception.
Funny part is we should have crashed with unhandled exception, unless we have set plugins to ignore unhandled exception. I would say that pr have found an issue, if any, instead of introduced an issue.
BTW, i admit that even if i run the test on my machine, i at most run a single node,,,,, i dont have a 4 nodes private net test environment. I will create one.
its because you run too many nodes in the same machine that all use leveldb
It was not a problem for me either, I used NeoBench to run 4-nodes and 7-nodes privnet with Dockerized C# nodes on my single machine, and it was OK.
i dont have a 4 nodes private net test environment.
I'd suggest you to use NeoBench, but it's not yet updated to use fresh monorepo, we have https://github.com/nspcc-dev/neo-bench/issues/175 for that.
its because you run too many nodes in the same machine that all use leveldb
It was not a problem for me either, I used NeoBench to run 4-nodes and 7-nodes privnet with Dockerized C# nodes on my single machine, and it was OK.
i dont have a 4 nodes private net test environment.
I'd suggest you to use NeoBench, but it's not yet updated to use fresh monorepo, we have https://github.com/nspcc-dev/neo-bench/issues/175 for that.
Are you using leveldb? Maybe it was rocksdb instead.
Were your experiments with master branch? Mine just run now reverting the exception handle crash.
@vncoelho Are you sure you didn't run out storage (disk space)? Why don't give #3355 a try?
Try doing ./neo-cli /repair
or neo-cli.exe /repair
Try doing
./neo-cli /repair
orneo-cli.exe /repair
This is not the case, @cschuchardt88 .
The testing environment is the same for testing with and without the PR being reverted. The problem is that leveldb probably regenerates from the crash, but the PR that handles exception detects it and then crash the client.
The behavior may not the wrong. But before merging that PR this should had been tested because the problem is simple to be seen. Can you verify that @superboyiii ?
Try with this version of LevelDbStore
#3274
its because you run too many nodes in the same machine that all use leveldb
It was not a problem for me either, I used NeoBench to run 4-nodes and 7-nodes privnet with Dockerized C# nodes on my single machine, and it was OK.
i dont have a 4 nodes private net test environment.
i would love to argue, but i am not an expert of leveldb, all i can
say is now it happened, and apparently a leveldb exception, not related to the core.
possible reasons could be: platform, os, version, dependencies. i would suggest to try rockdb and memorydb as well.
its because you run too many nodes in the same machine that all use leveldb
It was not a problem for me either, I used NeoBench to run 4-nodes and 7-nodes privnet with Dockerized C# nodes on my single machine, and it was OK.
i dont have a 4 nodes private net test environment.
i would love to argue, but i am not an expert of leveldb, all i can
say is now it happened, and apparently a leveldb exception, not related to the core.
possible reasons could be: platform, os, version, dependencies. i would suggest to try rockdb and memorydb as well.
So, this error without the Exception Handle was good and safe to run a node? Now, after the PR the node is broken, right?Is it not a core problem?
It's a corruption problem.
container
?version
of leveldb
you have?CI
build you using?filesystem
?Operating System
?CPU
arch?leveldb
`repair?threads
does you OS
limit?filesystem
repair tool?1. are you using a `container`?
Yes
2. what `version` of `leveldb` you have?
Master compiled plugin and
libleveldb-dev
from apt getmcr.microsoft.com/dotnet/aspnet:8.0.3-jammy
it is all dockerfile in a container with the amount of threads that is necessary for it to run safe. It usually run a node on mainnet with the resources it have available. It is running perfect without the commit I said that should be reverted until fixed.
The problem could be due to some limitation on leveldb safe off course. But that should be handled before the PR was merged. Furthermore, In my last tests rocksdb was also broken.
Only way to run a node nowdays is memorystore.
Still crashing. I thought it was solved but my config was with "MemoryStore" instead.
The problem persist even updating all libraries for dotnet during build and run.
RocksDb is also corrupted. But perhaps a difference reason.
I will setup a multi-nodes on my machine, will check it.
not entirely related, but see https://github.com/neo-project/neo-express/issues/455
fixed
Describe the bug Run a setup with 4 nodes running private net
To Reproduce Steps to reproduce the behavior: Start nodes and they will crash almost instantaneously
Error