Daemon crashes when `hyperdrive export` stays open for a while

brechtcs commented 4 years ago

Describe the bug On my server I always have some GNU Screen sessions open to keep alive hyperdive export for a couple of drives. Once every while this causes the following error, which will crash the entire daemon:

Error
    at Http2CallStream.<anonymous> (/home/brecht/.local/lib/node_modules/hyperdrive-daemon/node_modules/@grpc/grpc-j)
    at Http2CallStream.emit (events.js:333:22)
    at Http2CallStream.EventEmitter.emit (domain.js:485:12)
    at /home/brecht/.local/lib/node_modules/hyperdrive-daemon/node_modules/@grpc/grpc-js/build/src/call-stream.js:712
    at processTicksAndRejections (internal/process/task_queues.js:79:11)
Emitted 'error' event on ClientDuplexStreamImpl instance at:
    at Http2CallStream.<anonymous> (/home/brecht/.local/lib/node_modules/hyperdrive-daemon/node_modules/@grpc/grpc-j)
    at Http2CallStream.emit (events.js:333:22)
    [... lines matching original stack trace ...]
    at processTicksAndRejections (internal/process/task_queues.js:79:11) {
  code: 13,
  details: '',
  metadata: Metadata { options: undefined, internalRepr: Map(0) {} }
}

I think it's unrelated to the drives that are being exported. At least, I've seen it happen independently for two different drives. Just in case it is relevant though, these are their pubKeys:

aceaaf66960fed56ce1b6e87181cbfbf8cc111bc8c28b7010417f1a4548b59d6/
9bda2b8b224a3a4fef10d6302d07de36e118e5065ea9bc16359091ba968bf13b/

OS NixOS 20.03, Linux kernel 5.4.41

Node version 13.8.0

Was the daemon installed from NPM or bundled with Beaker? NPM

andrewosh commented 4 years ago

Thanks for the report @brechtcs, can you give me the output of hyperdrive status so I know what daemon/client version you're using?

brechtcs commented 4 years ago

Right, I knew I'd forgotten something:

The Hyperdrive daemon is running:

  API Version:             0
  Daemon Version:          1.13.14
  Client Version:          1.15.2                              
  Schema Version:          1.10.0
  Hyperdrive Version:      10.11.2
  Fuse Native Version:
  Hyperdrive Fuse Version:

  Holepunchable:           true
  Remote Address:          x.x.x.x:49737

  Fuse Available:          false
  Fuse Configured:         false                              
  Uptime:                  0 Days 2 Hours 25 Minutes 0 Seconds

brechtcs commented 4 years ago

Today, I unwittingly stopped the daemon while I still had one of those hyperdrive export processes running. When I went back to check on that screen session, the export command had crashed with this same error as above. So it might not be the export command itself that's causing the crashes, but just a crash somewhere else in the daemon triggering this error in export when the daemon stops.

andrewosh commented 4 years ago

Ah yeah GRPC error code 13 is the very informative "Internal" error, and if the daemon gets shut down (or crashes) it looks like this is how that error propagates to the exporting client.

I was able to repro it by doing just what you described, so thanks for updating the issue! I'll add some better error handling to the export/import commands to take this into account.

This does mean that the first time this happened to you, something else must have killed the daemon. We only keep logs around for the last 2 daemon restarts, so you probably don't have the relevant error anymore, but do you remember seeing anything in ~/.hyperdrive/log.json when you first opened the issue?

brechtcs commented 4 years ago

I did have a look at the log.json, but I only found info logging related to memory and CPU usage, so I didn't include it in the report.

But now you mention it, I think it's possible I already restarted the daemon at that point. So the relevant logs could already have been moved to logs.old.json, which I didn't check. I'll make sure to have a look at both next time something happens.

brechtcs commented 4 years ago

I have good news and bad news. The good news (kind of) is it crashed again, so I have the logs now: https://pastebin.com/gQRDkNWr. The bad news is that it seems to be full of info logs again, so I'm not sure you'll be able to get much useful information out of it.

I also noticed this time the GRPC error code was 1, not 13. If it follows the Unix convention, I guess that's also a generic error code that doesn't give much information.

brechtcs commented 4 years ago

I caught another one. This time I had two hyperdrive exports running. One terminated with gRPC error code 1, the other with 13. So I guess the former caused the error, and the latter is the result of the daemon crashing. Not sure it will add much info, but here's the logs nevertheless: https://pastebin.com/QAXUADUQ

hypercore-protocol / hyperdrive-daemon

Daemon crashes when `hyperdrive export` stays open for a while #69