neo-project / neo-express

Neo Private Net optimized for development scenarios
MIT License
35 stars 37 forks source link

neoxp hangs forever when executing the batch command #455

Closed gsmachado closed 2 months ago

gsmachado commented 2 months ago

Describe the bug

When I execute neoxp batch -i default -r setup.batch the process simply hangs forever or it outputs an ugly error (Segmentation fault (core dumped))

Example of a Segmentation fault: image

Below you can find the case that most frequently happens: the neoxp batch command just hangs, forever. Never returns. image

Sometimes, if the neoxp batch successfully completes, then, it simply outputs the following error when I start with the command neoxp run -i default: RocksDbSharp.RocksDbException: Corruption: bad WriteBatch Put.

To Reproduce

I can't reproduce the problem 100% of the time, since it might be a timing issue, race condition, or maybe rocksdb handlers are not properly closed. I'm not sure, but it certainly needs some investigation.

HOWEVER, I can provide you with a setup.batch and default.neo-express files that most likely can reproduce the problem:

Please use the following files:

Place them in a directory and then, within the same dir, run:

$ neoxp batch -i default -r setup.batch

Try to execute this command a couple of times in sequence.

Expected behavior The neoxp batch command should successfully complete, 100% of the time.

Info about platform

Additional context

gsmachado commented 2 months ago

We highly appreciate the good work put so far in neo-express. However, we rely on this software and need to address this soon. 🙏

Jim8y commented 2 months ago

Will check right away

cschuchardt88 commented 2 months ago

Wasnt able to reproduce

Can you try the standalone build please, and uninstall librocksdb-dev

Also this is unofficial image, I created that you can build off of (ref: https://github.com/neo-project/neo/pull/3355 )

docker pull ghcr.io/cschuchardt88/neo-cli:3.8.0

Also there is versioning issue with neo-core and neo.dll in this release of neo-express. Use standalone builds to insure that you using right neo-core version. Reason for this is because some PRs ended getting missed when 3.7.5 came out for fork fixes.

image

gsmachado commented 2 months ago

so... let me give you a bit more context:

I install the neo-express with the following command:

dotnet tool install Neo.Express -g --version 3.7.3

Actually, you can check how we do things in this repo, more specifically in this line.

gsmachado commented 2 months ago

Ok, @chenzhitong @cschuchardt88, here are the steps to reproduce:

  1. Run a plain docker container just mounting the default.neo-express and setup.batch files I provided here.
docker run -it \
    -v $(pwd)/default.neo-express:/neoxp/default.neo-express \ 
    -v $(pwd)/setup.batch:/neoxp/setup.batch \
    -w /neoxp \
    ubuntu:noble \
    bash
  1. Once you get the bash prompt in the docker container, just update packages and install wget:
apt-get update -y && apt-get install wget

3) Fetch the Neo Express version 3.7.3. If you run your docker using arm64, please update the URL accordingly.

wget https://github.com/neo-project/neo-express/releases/download/3.7.3/Neo.Express-linux-x64-3.7.3.tar.gz

4) Uncompress the files:

tar -xvzf Neo.Express-linux-x64-3.7.3.tar.gz

5) You need to set this env variable to be able to run the neoxp binary, so let's set for this bash session only (since it's just a test):

export DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1

6) Then, try to run this command a couple of times:

./neoxp batch -i default -r setup.batch

👉 Please, let me know if you could make the batch process hang forever. 😸

gsmachado commented 2 months ago

Also, I tried the following:

I just took ghcr.io/cschuchardt88/neo-cli:3.8.0 and used it in Step (1) of this comment instead of ubuntu:noble, and I achieved the same result: batch was hanging sometimes (not always, though!).

vncoelho commented 2 months ago

@gsmachado, I am struggling with this for almost 1 month! Let's simple, at least, revert the PR that made this and then work. This is destroying my workflow COMPLETELY.

mialbu commented 2 months ago

This bug is blocking neow3j from releasing a Neo-3.7.x-compatible version. Additionally, by blocking neow3j from releasing a Neo-3.7.x-compatible version, it's also blocking a desired feature (keccak256) to be used in the development of the native bridge to Neo X. Therefore, we really need this fixed as soon as possible.

cschuchardt88 commented 2 months ago

@mialbu @gsmachado all looks good, unless you have more use cases, i can test. I will see if i can get a release sometime next week.

gsmachado commented 2 months ago

I can confirm that the fix works! Not hanging anymore.

Great work, team! @cschuchardt88 @chenzhitong @Jim8y

Jim8y commented 2 months ago

credit goes to @cschuchardt88 , he such a productive and senior developer.

mialbu commented 2 months ago

I will see if i can get a release sometime next week.

Hey @cschuchardt88, can you give me an ETA for a release with this fix? Do you think it's possible to publish a new release early next week?

Jim8y commented 2 months ago

@mialbu Release of express can be done at the communities convenience. Will have a new version ASAP, expect next 2 days.