okx / exchain

⛓️ EVM & Wasm $ IBC-compatible, OKTC is a L1 blockchain network built on top of Cosmos SDK that aims for optimal interoperability and performance ✨
https://www.okx.com/oktc
Other
575 stars 182 forks source link

Error with docker image #1557

Open breezytm opened 2 years ago

breezytm commented 2 years ago

Error when deploying full node with docker image

1. Describe

The following error occursConnecting to raw.githubusercontent.com (185.199.111.133:443) wget: can't open '/root/.exchaind/config/genesis.json': No such file or directory /root/start.sh: line 11: 10 Illegal instruction (core dumped) exchaind start --chain-id exchain-66 --rest.laddr tcp://0.0.0.0:8545 --db_backend rocksdb

docker parameters docker run -d --name exchain-mainnet-fullnode -v ~/.exchaind/data:/root/.exchaind/data/ -p 8545:8545 -p 26656:26656 okexchain/fullnode-mainnet:latest

For Admin Use

suyog-bhat commented 2 years ago

same here, not able to run docker image

cwbhhjl commented 2 years ago
docker run -d --name exchain-mainnet-fullnode -v ~/.exchaind:/root/.exchaind -p 8545:8545 -p 26656:26656 okexchain/fullnode-mainnet:latest

make sure you have config and data directories in your exchaind directory

neoromantique commented 2 years ago
```shell
docker run -d --name exchain-mainnet-fullnode -v ~/.exchaind:/root/.exchaind -p 8545:8545 -p 26656:26656 okexchain/fullnode-mainnet:latest

make sure you have config and data directories in your exchaind directory

I have tried initialising both directories, populating genesis, and priv_validator_state.json, but it still crashes with Illegal instruction (core dumped)

cwbhhjl commented 2 years ago

@neoromantique

Can you redo the deployment following this thread? If there is an error in the deployment please tell me which step went wrong and what is the specific error

https://forum.okt.club/d/299-how-to-start-a-mainnet-node

neoromantique commented 2 years ago

@neoromantique

Can you redo the deployment following this thread? If there is an error in the deployment please tell me which step went wrong and what is the specific error

https://forum.okt.club/d/299-how-to-start-a-mainnet-node

I cannot even execute exchaind init from within docker. And building it for my host defeats the point of docker image in the first place (And I think wouldn't help anyway).

cwbhhjl commented 2 years ago

@neoromantique try this

  1. mkdir ~/okc
  2. cd ~/okc
  3. curl -O https://okg-pub-hk.oss-cn-hongkong.aliyuncs.com/cdn/oec/snapshot/mainnet-s0-20221018-14723313-rocksdb.tar.gz
  4. tar zxvf mainnet-s0-20221018-14723313-rocksdb.tar.gz
  5. docker run -d --name exchain-mainnet-fullnode -v ~/okc/data:/root/.exchaind/data/ -p 8545:8545 -p 26656:26656 okexchain/fullnode-mainnet:latest
neoromantique commented 2 years ago

@neoromantique try this

  1. mkdir ~/okc
  2. cd ~/okc
  3. curl -O https://okg-pub-hk.oss-cn-hongkong.aliyuncs.com/cdn/oec/snapshot/mainnet-s0-20221018-14723313-rocksdb.tar.gz
  4. tar zxvf mainnet-s0-20221018-14723313-rocksdb.tar.gz
  5. docker run -d --name exchain-mainnet-fullnode -v ~/okc/data:/root/.exchaind/data/ -p 8545:8545 -p 26656:26656 okexchain/fullnode-mainnet:latest

Same exact output.

root@hostname ~/okc # ls -la
total 31175916
drwxr-xr-x 3 root root          81 Oct 19 20:20 .
drwx------ 8 root root         269 Oct 19 16:02 ..
drwx------ 7 root root         161 Oct 17 20:33 data
-rw-r--r-- 1 root root 31924135195 Oct 19 17:19 mainnet-s0-20221018-14723313-rocksdb.tar.gz
root@ hostname ~/okc # docker logs --tail 100 -f 25d
/root/start.sh: line 6:     7 Illegal instruction     (core dumped) exchaind init fullnode --chain-id exchain-66
Connecting to raw.githubusercontent.com (185.199.109.133:443)
wget: can't open '/root/.exchaind/config/genesis.json': No such file or directory
/root/start.sh: line 11:    10 Illegal instruction     (core dumped) exchaind start --chain-id exchain-66 --rest.laddr tcp://0.0.0.0:8545 --db_backend rocksdb
root@hostname ~/okc # 
cwbhhjl commented 2 years ago

@neoromantique https://stackoverflow.com/questions/54698812/illegal-instruction-core-dumped-when-trying-to-execute-elf-file

It means the compiled binary contains an instruction(possibly more than one instruction) that's not valid on the architecture where you're running it.

Based on this post and other related posts on stackoverflow, I'm guessing it might be a hardware issue.

You can run your binary under gdb to find out specific instruction: gdb ./precompiled (gdb) run (gdb) bt (gdb) disassemble Then type run and then when it fails, run bt (backtrace) to see where it fails. Use disassemble to see the specific instruction that's causing the failure.

Can you try this or try running okc on another machine?

neoromantique commented 2 years ago

I'm running it on AMD Ryzen 9 5950X, it's fairly standard and modern hardware.

https://gist.github.com/neoromantique/ab52f80e31a4a4df70bd0b744f870275

cwbhhjl commented 2 years ago

I'm running it on AMD Ryzen 9 5950X, it's fairly standard and modern hardware.

https://gist.github.com/neoromantique/ab52f80e31a4a4df70bd0b744f870275

@neoromantique

Program received signal SIGILL, Illegal instruction.
0x0000000001dedb56 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, rocksdb::OptionTypeInfo>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, rocksdb::OptionTypeInfo> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_Hashtable<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, rocksdb::OptionTypeInfo> const*> (this=0x453b760 <rocksdb::(anonymous namespace)::sc_wrapper_type_info>, __f=0x7fffffffe460, __l=0x7fffffffe4f8, __bkt_count_hint=0, __h1=..., __h2=..., 
    __h=..., __eq=..., __exk=..., __a=...) at /usr/include/c++/10.3.1/bits/stl_iterator_base_funcs.h:138
   0x0000000001dedb49 <+121>:   movq   $0x0,0x10(%rdi)
   0x0000000001dedb51 <+129>:   vmovq  %rax,%xmm0
=> 0x0000000001dedb56 <+134>:   vpmaxuq %xmm1,%xmm0,%xmm0
   0x0000000001dedb5c <+140>:   vmovq  %xmm0,%rsi

We can see that the instruction causing the error is vpmaxuq.

https://www.officedaytime.com/simd512e/ https://en.wikipedia.org/wiki/AVX-512

It looks like vpmax is an AVX512 instruction, and Ryzen doesn't support it.

https://www.quora.com/Does-Ryzen-support-AVX

The error comes from rocksdb, I think we can try by recompiling rocksdb on your machine.

  1. cd ~
  2. git clone -b v1.6.3 https://github.com/okex/exchain.git
  3. cd exchain
  4. make rocksdb
  5. make mainnet
  6. exchaind init okc-mainnet-node --chain-id exchain-66 --home ~/.exchaind

If an error occurs in the step of make rocksdb, please compile rocksdb with version 6.27.3 according to the official documents. https://github.com/facebook/rocksdb

cwbhhjl commented 2 years ago

@neoromantique Has your problem been resolved?

neoromantique commented 2 years ago

@neoromantique Has your problem been resolved?

Well, kinda. I've used my own Dockerfile based on Ubuntu to build the rocksdb and exchain, after that it works fine, even with rocksdb.

lukasz-layerzerolabs commented 3 months ago

I also had this issue. In my case it was rocksdb linked to libstdc++-dev missing on my docker

docker run --rm -ti --platform="linux/x86_64" --privileged okexchain/fullnode-mainnet sh

okexchain:/go/bin# apk add gdb
OK: 208 MiB in 110 packages

okexchain:/go/bin# mkdir -p /root/.config/gdb/

okexchain:/go/bin# echo "set auto-load safe-path /" > /root/.config/gdb/gdbinit

okexchain:/go/bin# gdb exchaincli 
GNU gdb (GDB) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-alpine-linux-musl".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from exchaincli...
Loading Go Runtime support.

(gdb) run
Starting program: /go/bin/exchaincli 

Program received signal SIGILL, Illegal instruction.
0x0000000001817bc6 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, rocksdb::OptionTypeInfo>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, rocksdb::OptionTypeInfo> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_Hashtable<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, rocksdb::OptionTypeInfo> const*> (this=0x38a6b40 <rocksdb::(anonymous namespace)::sc_wrapper_type_info>, __f=0x7fffffffea30, __l=0x7fffffffeac8, __bkt_count_hint=0, __h1=..., __h2=..., __h=..., __eq=..., __exk=..., __a=...) at /usr/include/c++/10.3.1/bits/hashtable.h:1058
1058    /usr/include/c++/10.3.1/bits/hashtable.h: No such file or directory.

(gdb) exit