Closed arnetheduck closed 2 years ago
with some regularity, when using libp2p to connect to go-libp2p, I can lose the connection due to an EAGAIN in the read stream loop - this would indicate that there's a race condition because we should only be reading when there's data:
EAGAIN
Breakpoint 3, osErrorMsg__pureZos_358 (errorCode=errorCode@entry=11) at /home/arnetheduck/status/mite/vendor/nimbus-build-system/vendor/Nim/lib/pure/includes/oserr.nim:18 18 proc osErrorMsg*(errorCode: OSErrorCode): string = (gdb) bt full #0 osErrorMsg__pureZos_358 (errorCode=errorCode@entry=11) at vendor/nimbus-build-system/vendor/Nim/lib/pure/includes/oserr.nim:18 result = <optimized out> FR_ = {prev = 0xc, procname = 0x820f08 "vendor/nimbus-build-system/vendor/Nim/lib/system/dollars.nim", line = 393216, filename = 0x7ffff7c67da0 "\003", len = -13984, calldepth = -1, frameMsgLen = 140737350357408} #1 0x00000000004fce3a in readStreamLoop__OOZvendorZnim45chronosZchronosZtransportsZstream_1109 (udata=0x7ffff7c64050) at vendor/nim-chronos/chronos/transports/common.nim:570 msgX60gensym68_ = <optimized out> T43_ = 0x0 T45_ = 0x0 treX60gensym68_ = <optimized out> T47_ = <optimized out> T44_ = 0x7ffff7c651a0 T46_ = <optimized out> TM__2I7Lp9b9cDBRsWiKyn0XigmQ_29 = {prev = 0x7fffffffc9e0, status = 8629082, context = {{__jmpbuf = {32, 8528944, 786432, 8522840, 140737350554368, 9515128, 0, 4550924}, __mask_was_saved = -13776, __saved_mask = {__val = { 8527581, 169, 8522840, 140733194108928, 8526977, 179, 140737350554368, 140737350384872, 5065521, 140737488341616, 8522648, 243, 8522840, 140733194043392, 140737350554368, 140737350554368}}}}} err = 11 res = <optimized out> T25_ = <optimized out> TM__2I7Lp9b9cDBRsWiKyn0XigmQ_24 = <optimized out> transp = 0x7ffff7c64050 fd = 5 _nimCurFrame = 0x7fffffffced0 FR_ = {prev = 0x7fffffffcf60, procname = 0x82a0df "readStreamLoop", line = 570, filename = 0x829ca8 "vendor/nim-chronos/chronos/transports/common.nim", len = 0, calldepth = 5, frameMsgLen = 8561393} LA65_ = <optimized out> LA118_ = <optimized out> #2 0x00000000004ec72a in poll__OOZvendorZnim45chronosZchronosZasyncloop_1742 () at vendor/nim-chronos/chronos/asyncloop.nim:288 callableX60gensym68_ = {function = {ClP_0 = 0x4fc851 <readStreamLoop__OOZvendorZnim45chronosZchronosZtransportsZstream_1109>, ClE_0 = 0x0}, udata = 0x7ffff7c64050} TM__xRflM5lMv4YvZq7gOUBMew_311 = <optimized out> iX60gensym68_ = 2 i_3 = 2 loop = 0x7ffff7c6d050 curTime = {value = 1} curTimeout = <optimized out> customSet = 60 lastFinishX60gensym49_ = <optimized out> count_1 = 1 curTimeX60gensym62_ = <optimized out> countX60gensym68_ = 3 FR_ = {prev = 0x7fffffffd560, procname = 0x82856b "poll", line = 288, filename = 0x828758 "vendor/nim-chronos/chronos/asyncloop.nim", len = 0, calldepth = 4, frameMsgLen = 8912384}
746832384a036dc159d3bd559c53d1594e7175c4
Run it with chronosFutureTracking and call dumpPendingFutures on error please.
chronosFutureTracking
dumpPendingFutures
found the issue - my mistake, a nested poll call :/
with some regularity, when using libp2p to connect to go-libp2p, I can lose the connection due to an
EAGAIN
in the read stream loop - this would indicate that there's a race condition because we should only be reading when there's data: