Open iameli opened 4 years ago
Seems very likely this is C rather than Go. I'm going to build everything on one of the machines that's not working and see if it enables different CPU flags and whatnot than a regular build.
Hmm, I had thought this was unrelated to GPU code, but it looks like it only shows up with -nvidia
enabled. CPU transcoding works fine.
I barely know what I'm doing with debugging C code and whatnot, but I think this means the problem is GMP trying to make use of CPU opcodes that aren't there.
(gdb) continue
Continuing.
Thread 10 "livepeer" received signal SIGILL, Illegal instruction.
[Switching to Thread 0x7f9573fff700 (LWP 115)]
0x0000000001284997 in __gmpn_sqr_basecase ()
(gdb) bt
#0 0x0000000001284997 in __gmpn_sqr_basecase ()
#1 0xfffffffffffffffc in ?? ()
#2 0x0000000000000000 in ?? ()
I've tried compiling with CFLAGS="-mnoavx -mnoavx2"
without success.
Confirmed this is a problem with the GMP we ship in the statically-linked binary. Worked around it by using system-provided gnutls, removing the static linking. Not sure how to go about getting a more-compatible gnutls build... presumably some kind of configure flag on GMP and/or gnutls and/or ffmpeg.
It's a mystery to me why this shows up only if we're doing GPU transcoding. GMP is only used as a gnutls dependency, so that implies it was being used when we were making internal HTTPS requests for segments... why wouldn't that happen when CPU transcoding also?
This, or something like it, has started to occur again. Certain orchestrators are getting scheduled onto Celeron processors in ORD and crashlooping. Gotta be... tensorflow?
@iameli can this be closed? is is the same issue as #2023?
go-livepeer appears to be crashing on some older CPU architectures — specifically
Celeron G3930
. Update from comments: this appears to be originating from the GMP we ship within our static go-livepeer binary. It also only shows up when we're doing GPU transcoding. I do not know why.Error is:
Here's a gist with the full logs: https://gist.github.com/iameli/be4ae26f06f906678556f3e91a16e5a7