phoreproject / graphene

Phore Synapse working repository
MIT License
13 stars 6 forks source link

vm.latestEpochInformation.slots -- index out of range #109

Open wqking opened 4 years ago

wqking commented 4 years ago

Branch: master

Code (in validatormanager.go):

// NewSlot is run when a new slot starts.
func (vm *Manager) NewSlot(slotNumber uint64) error {
    earliestSlot := vm.latestEpochInformation.earliestSlot
    logrus.WithField("slot", slotNumber).Debug("heard new slot")

    ///////////////////////// here caused index out of range
    proposerSlotCommittees := vm.latestEpochInformation.slots[int64(slotNumber-1)-earliestSlot]

Stack trace:

panic: runtime error: index out of range

goroutine 1 [running]:
github.com/phoreproject/synapse/validator.(*Manager).NewSlot(0xc0000c82c0, 0x21749, 0xc000083ad8, 0x1)
    E:/projects/wqking/go/src/github.com/phoreproject/synapse/validator/validatormanager.go:267 +0xef8
github.com/phoreproject/synapse/validator.(*Manager).ListenForBlockAndCycle(0xc0000c82c0, 0xc000062e40, 0xada960)
    E:/projects/wqking/go/src/github.com/phoreproject/synapse/validator/validatormanager.go:422 +0x69a
github.com/phoreproject/synapse/validator.(*Manager).Start(...)
    E:/projects/wqking/go/src/github.com/phoreproject/synapse/validator/validatormanager.go:437
github.com/phoreproject/synapse/validator/module.(*ValidatorApp).Run(0xc00009ab40, 0x18, 0xc0000640e0)
    E:/projects/wqking/go/src/github.com/phoreproject/synapse/validator/module/app.go:105 +0x661
main.main()
    E:/projects/wqking/go/src/github.com/phoreproject/synapse/cmd/validator/synapsevalidator.go:28 +0x1d3
meyer9 commented 4 years ago

This occurs when the networkid's don't match. slots should have 2 * epochLength elements and earliestSlot is defined as int64(state.Slot) - int64(state.Slot%config.EpochLength) - int64(config.EpochLength).

We should add an RPC method to get the hash of the config so that the validator module can compare it with it's config.

Add an RPC method to the beaconrpc.proto to GetConfigHash and have it return a message with the config hash bytes. config.Hash() = ssz.HashTreeRoot(config) or something like that. Then have the validator check that hash against it's config on startup.

wqking commented 4 years ago

I added GetConfigHash and compared it in validator. See 9148c3549fd9c080e843fe7868755267118dd1ba

The validator uses the same network ID with Beacon and the newly added check succeeds.
So there must be other reasons for the out of range error. I will check more.

wqking commented 4 years ago

I found the reason. It's caused by my newly added epoch age check in GetEpochInformation.
Code,

func (s *server) GetEpochInformation(ctx context.Context, in *pb.EpochInformationRequest) (*pb.EpochInformationResponse, error) {
    state := s.chain.GetState()
    config := s.chain.GetConfig()

    if in.EpochIndex > s.chain.GetCurrentSlot()/s.chain.GetConfig().EpochLength-3 {
        return &pb.EpochInformationResponse{
            HasEpochInformation: false,
            Information:         nil,
        }, nil
    }

    ...

When the condition is satisfied, the HasEpochInformation in returned pb.EpochInformationResponse is false.
Then in function UpdateEpochInformation in validatormanager.go, it will return early without update the epoch, code,

// UpdateEpochInformation updates epoch information from the beacon chain
func (vm *Manager) UpdateEpochInformation(slotNumber uint64) error {
    epochInformation, err := vm.blockchainRPC.GetEpochInformation(context.Background(), &pb.EpochInformationRequest{EpochIndex: slotNumber / vm.config.EpochLength})
    if err != nil {
        logrus.WithField("function", "UpdateEpochInformation").Errorf("GetEpochInformation error: %v", err)
        return err
    }

    if !epochInformation.HasEpochInformation {
        logrus.WithField("function", "UpdateEpochInformation").Errorf("epochInformation.HasEpochInformation is false")
        return nil
    }

Then vm.latestEpochInformation is left untainted, with default values, that slots has zero length and earliestSlot is 0. Then in function NewSlot it's out of range.

Seems we need to deal with when NewSlot is called while there is no latestEpochInformation.

meyer9 commented 4 years ago

Yep, we should do that. Just return early if we don't have epoch info.

wqking commented 4 years ago

I added early return if there is no epoch info. After I start a new blockchain (with genesis time one minute in the future), after it works for everal seconds, it still has index out of range error which is not caused by lacking epoch info.

Message

time="2019-11-21T13:06:09+08:00" level=error msg="epochInformation.HasEpochInformation is false" function=UpdateEpochInformation
time="2019-11-21T13:06:10+08:00" level=error msg="slotNumber is out of range for vm.latestEpochInformation.slots" earliestSlot=8 function=NewSlot len(vm.latestEpochInformation.slots)=16 slotNumber=25
panic: runtime error: index out of range

Seems an epoch update was skipped due to the epoch was too far from now, then it goes out of range.