prysmaticlabs / prysm

Go implementation of Ethereum proof of stake
https://www.offchainlabs.com
GNU General Public License v3.0
3.47k stars 1.01k forks source link

Panicked when handling p2p message! #6345

Closed ldouze closed 4 years ago

ldouze commented 4 years ago

🐞 Bug Report

Description

Beacon-chain exits with message "Panicked when handling p2p message!"

Has this worked before in a previous version?

Not sure

🔬 Minimal Reproduction

docker run -it -v $HOME/.eth2:/data -p 4000:4000 -p 8080:8080 -p 13000:13000 -p 12000:12000/udp --name beacon-node gcr.io/prysmaticlabs/prysm/beacon-chain:latest --datadir=/data --rpc-host 0.0.0.0

🔥 Error





[prysm8.log](https://github.com/prysmaticlabs/prysm/files/4814761/prysm8.log)

🌍 Your Environment

Operating System:

  
Contabo VPS
CentOS Linux release 8.1.1911 (Core)
  

What version of Prysm are you running? (Which release)

  
CentOS Linux release 8.1.1911 (Core)
  

Anything else relevant (validator index / public key)? beacon-chain-v1.0.0-alpha.12-linux-amd64

shayzluf commented 4 years ago

hi @ldouze could you please add the log file to the issue in order to make it easier for us to debug it?

ldouze commented 4 years ago

prysm8.log

shayzluf commented 4 years ago

@farazdagi please reassign to me if assignment seems inappropriate

DigiDr commented 4 years ago

beacon-node.txt

I'm also seeing this with latest version of the beacon node on the Onyx testnet.

stefa2k commented 4 years ago

I got this too, 37 times in the last 30 minutes over all 3 nodes I'm running. Running docker tag :HEAD-6c7131.

nisdas commented 4 years ago

@KuDeTa that looks like a different panic, @prestonvanloon I remember we had that error before, wasn't it resolved ?

nisdas commented 4 years ago

@ldouze looking at your error , it looks like a memory corruption error in your system

 unexpected fault address 0xc668050432
 fatal error: fault
 [signal SIGSEGV: segmentation violation code=0x1 addr=0xc668050432 pc=0x15f8cf6]

 goroutine 1725 [running]:
 runtime.throw(0x5d8695, 0x5)
    GOROOT/src/runtime/panic.go:1116 +0x72 fp=0xc00791c9f8 sp=0xc00791c9c8 pc=0x15cf042
 runtime.sigpanic()
    GOROOT/src/runtime/signal_unix.go:702 +0x3cc fp=0xc00791ca28 sp=0xc00791c9f8 pc=0x15e606c
 runtime.(*_type).string(0xc66805040a, 0x2924cb, 0x1e)
    GOROOT/src/runtime/type.go:51 +0x26 fp=0xc00791ca50 sp=0xc00791ca28 pc=0x15f8cf6
 runtime.(*TypeAssertionError).Error(0xc02ba57dd0, 0x3c3a40, 0xc02ba57dd0)
    GOROOT/src/runtime/error.go:39 +0xa1 fp=0xc00791cb68 sp=0xc00791ca50 pc=0x15a1a61
 fmt.(*pp).handleMethods(0xc0190176c0, 0xc000000076, 0x1)
    GOROOT/src/fmt/print.go:624 +0x1db fp=0xc00791cdd8 sp=0xc00791cb68 pc=0x168727b
 fmt.(*pp).printArg(0xc0190176c0, 0x3c3a40, 0xc02ba57dd0, 0x76)
    GOROOT/src/fmt/print.go:713 +0x1e4 fp=0xc00791ce70 sp=0xc00791cdd8 pc=0x1687944
 fmt.(*pp).doPrintf(0xc0190176c0, 0x5fd3b8, 0x1d, 0xc00791d090, 0x3, 0x3)
    GOROOT/src/fmt/print.go:1030 +0x15a fp=0xc00791cf58 sp=0xc00791ce70 pc=0x168b1ea

The above is what finally crashed your node, looking at all the previous panics before it would point to a memory corruption issue external to prysm. The above panic wouldn't be possible in a normal node as it indicates a non existent memory space.

prestonvanloon commented 4 years ago

@nisdas I don't recall a panic on fmt.print. It does seem very odd and more likely system related... Maybe the process has exhausted all memory or maybe docker is up to something...

This seems very hard to reproduce. Does it happen often @ldouze?

ldouze commented 4 years ago

No, it does not happen often.  I think only one more time in three weeks.  My SIGSEGV problem however,  still happens once every few hours and also may be system related

Verzonden via Yahoo Mail op Android

Op ma, jul. 6, 2020 om 22:42 schreef Preston Van Loonnotifications@github.com:

@nisdas I don't recall a panic on fmt.print. It does seem very odd and more likely system related... Maybe the process has exhausted all memory or maybe docker is up to something...

This seems very hard to reproduce. Does it happen often @ldouze?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

nisdas commented 4 years ago

I did get this today:

unexpected fault address 0x7f5dbc9d705c
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x7f5dbc9d705c pc=0x15c1637]

goroutine 241 [running]:
runtime.throw(0x5c9b20, 0x5)
        GOROOT/src/runtime/panic.go:1116 +0x72 fp=0xc00161dd20 sp=0xc00161dcf0 pc=0x158d042
runtime.sigpanic()
        GOROOT/src/runtime/signal_unix.go:702 +0x3cc fp=0xc00161dd50 sp=0xc00161dd20 pc=0x15a406c
runtime.memmove(0xc013b78780, 0x7f5dbc9d705c, 0x20)
        GOROOT/src/runtime/memmove_amd64.s:181 +0x147 fp=0xc00161dd58 sp=0xc00161dd50 pc=0x15c1637
github.com/prysmaticlabs/prysm/shared/bytesutil.SafeCopyBytes(...)
        shared/bytesutil/bytes.go:196
github.com/prysmaticlabs/prysm/beacon-chain/state.CopyCheckpoint(...)
        beacon-chain/state/cloners.go:68
github.com/prysmaticlabs/prysm/beacon-chain/blockchain.(*Service).Start(0xc0000ba800)
        beacon-chain/blockchain/service.go:180 +0x107a fp=0xc00161dfd8 sp=0xc00161dd58 pc=0x235aeda
runtime.goexit()
        GOROOT/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc00161dfe0 sp=0xc00161dfd8 pc=0x15c0331
created by github.com/prysmaticlabs/prysm/shared.(*ServiceRegistry).StartAll
        shared/service_registry.go:46 +0x23e

Is this what you see too ? @ldouze , or something similar

ldouze commented 4 years ago

My knowledge doesnt extend that far, but as an example see some logs attached: On Tuesday, July 7, 2020, 10:42:43 AM GMT+2, Nishant Das notifications@github.com wrote:

I did get this today: unexpected fault address 0x7f5dbc9d705c fatal error: fault [signal SIGSEGV: segmentation violation code=0x1 addr=0x7f5dbc9d705c pc=0x15c1637]

goroutine 241 [running]: runtime.throw(0x5c9b20, 0x5) GOROOT/src/runtime/panic.go:1116 +0x72 fp=0xc00161dd20 sp=0xc00161dcf0 pc=0x158d042 runtime.sigpanic() GOROOT/src/runtime/signal_unix.go:702 +0x3cc fp=0xc00161dd50 sp=0xc00161dd20 pc=0x15a406c runtime.memmove(0xc013b78780, 0x7f5dbc9d705c, 0x20) GOROOT/src/runtime/memmove_amd64.s:181 +0x147 fp=0xc00161dd58 sp=0xc00161dd50 pc=0x15c1637 github.com/prysmaticlabs/prysm/shared/bytesutil.SafeCopyBytes(...) shared/bytesutil/bytes.go:196 github.com/prysmaticlabs/prysm/beacon-chain/state.CopyCheckpoint(...) beacon-chain/state/cloners.go:68 github.com/prysmaticlabs/prysm/beacon-chain/blockchain.(Service).Start(0xc0000ba800) beacon-chain/blockchain/service.go:180 +0x107a fp=0xc00161dfd8 sp=0xc00161dd58 pc=0x235aeda runtime.goexit() GOROOT/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc00161dfe0 sp=0xc00161dfd8 pc=0x15c0331 created by github.com/prysmaticlabs/prysm/shared.(ServiceRegistry).StartAll shared/service_registry.go:46 +0x23e

Is this what you see too ? @ldouze , or something similar

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

rauljordan commented 4 years ago

Update on this @nisdas ?

nisdas commented 4 years ago

This looks like a non prysm error, looks to be an issue with the specific machine. So closing this for now.