vibe-d / eventcore

High performance proactor event loop abstraction library
MIT License
60 stars 42 forks source link

Do not call `AA.remove` from finalizer #166

Open Geod24 opened 3 years ago

Geod24 commented 3 years ago

We had the unpleasant experience of tracking down a bug to this backtrace:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.5
  * frame #0: 0x0000000100971d58 agora`_D4core9exception__T11staticErrorTCQBhQBf27InvalidMemoryOperationErrorZQBzFNaNbNiZQBw at exception.d:646:5
    frame #1: 0x0000000100971d41 agora`onInvalidMemoryOperationError(pretend_sideffect=0x0000000000000000) at exception.d:539:5
    frame #2: 0x000000010098733d agora`_D2gc4impl12conservativeQw14ConservativeGC6lockNRFNbNiZv at gc.d:162:13
    frame #3: 0x0000000100988501 agora`_D2gc4impl12conservativeQw14ConservativeGC__T9runLockedS_DQCeQCeQCcQCnQBs12mallocNoSyncMFNbmkKmxC8TypeInfoZPvS_DQEgQEgQEeQEp10mallocTimelS_DQFiQFiQFgQFr10numMallocslTmTkTmTxQCzZQFcMFNbKmKkKmKxQDsZQDl(this=0x0000000102f050c0, _param_0=0x000000010ea03610, _param_1=0x000000010ea0360c, _param_2=0x000000010ea035f8, _param_3=0x000000010ea03600) at gc.d:247:9
    frame #4: 0x0000000100988827 agora`_D2gc4impl12conservativeQw14ConservativeGC6callocMFNbmkxC8TypeInfoZPv(this=0x0000000102f050c0, size=2048, bits=16, ti=0x0000000000000000) at gc.d:438:9
    frame #5: 0x00000001009987ea agora`gc_calloc(sz=2048, ba=16, ti=0x0000000000000000) at proxy.d:184:9
    frame #6: 0x000000010097b133 agora`_D4core6memory2GC6callocFNaNbmkxC8TypeInfoZPv(sz=2048, ba=16, ti=0x0000000000000000) at memory.d:551:9
    frame #7: 0x000000010099ffba agora`_D2rt3aaA12allocBucketsFNaNbNemZASQBgQBg6Bucket(dim=128) at aaA.d:197:5
    frame #8: 0x00000001009a077b agora`_D2rt3aaA4Impl6resizeMFNaNbmZv(this=0x0000000000000200, ndim=128) at aaA.d:145:9
    frame #9: 0x00000001009a08a4 agora`_D2rt3aaA4Impl6shrinkMFMxC8TypeInfoZv(this=0x0000000000000200, keyti=0x0000000100decda0) at aaA.d:139:13
    frame #10: 0x00000001009a1ac9 agora`_aaDelX(aa=AA @ 0x000000010ea03788, keyti=0x0000000100decda0, pkey="�\f") at aaA.d:632:13
    frame #11: 0x000000010074c549 agora`_D9eventcore7drivers5timer22LoopTimeoutTimerDriver10releaseRefMFNbNfSQCp6driver7TimerIDZb + 361
    frame #12: 0x000000010074d277 agora`_D9eventcore7drivers5timer22LoopTimeoutTimerDriver10cancelWaitMFNbNfSQCp6driver7TimerIDZv + 183
    frame #13: 0x000000010074c4e7 agora`_D9eventcore7drivers5timer22LoopTimeoutTimerDriver10releaseRefMFNbNfSQCp6driver7TimerIDZb + 263
    frame #14: 0x00000001006b23a9 agora`_D4vibe4core8internal7release__T13releaseHandleVAyaa6_74696d657273TS9eventcore6driver7TimerIDZQCkFNbNfQBjOCQBn7drivers5posixQBu__T16PosixEventDriverHTCQDfQBsQBn9cfrunloop18CFRunLoopEventLoopZQCjZv + 105
    frame #15: 0x00000001006a8447 agora`_D4vibe4coreQf5Timer6__dtorMFNbNfZv + 71
    frame #16: 0x00000001000dc7ee agora`_D5agora6common4Task10VibedTimer11__fieldDtorMFNbNeZv + 30
    frame #17: 0x00000001009af88b agora`rt_finalize2(p=" ��", det=false, resetMemory=false) at lifetime.d:1489:21
    frame #18: 0x00000001009af9a0 agora`rt_finalizeFromGC(p=" ��", size=48, attr=1) at lifetime.d:1523:9
    frame #19: 0x0000000100990d55 agora`_D2gc4impl12conservativeQw3Gcx5sweepMFNbZm(this=0x0000000000000000) at gc.d:2474:41
    frame #20: 0x000000010098b3c5 agora`_D2gc4impl12conservativeQw3Gcx11fullcollectMFNbbZm(this=0x0000000000000000, nostack=false) at gc.d:2667:13
    frame #21: 0x000000010098d3aa agora`_D2gc4impl12conservativeQw3Gcx10smallAllocMFNbmKmkxC8TypeInfoZPv(this=0x0000000000000000, size=1632, alloc_size=0x000000010ea03fd8, bits=2, ti=0x0000000000000000) at gc.d:1678:17
    frame #22: 0x000000010098869d agora`_D2gc4impl12conservativeQw3Gcx5allocMFNbmKmkxC8TypeInfoZPv(this=0x0000000000000000, size=1632, alloc_size=0x000000010ea03fd8, bits=2, ti=0x0000000000000000) at gc.d:1634:9
    frame #23: 0x00000001009885dd agora`_D2gc4impl12conservativeQw14ConservativeGC12mallocNoSyncMFNbmkKmxC8TypeInfoZPv(this=0x0000000102f050c0, size=1632, bits=2, alloc_size=0x000000010ea03fd8, ti=0x0000000000000000) at gc.d:389:9
    frame #24: 0x0000000100988526 agora`_D2gc4impl12conservativeQw14ConservativeGC__T9runLockedS_DQCeQCeQCcQCnQBs12mallocNoSyncMFNbmkKmxC8TypeInfoZPvS_DQEgQEgQEeQEp10mallocTimelS_DQFiQFiQFgQFr10numMallocslTmTkTmTxQCzZQFcMFNbKmKkKmKxQDsZQDl(this=0x0000000102f050c0, _param_0=0x000000010ea04000, _param_1=0x000000010ea03ffc, _param_2=0x000000010ea03fd8, _param_3=0x000000010ea03ff0) at gc.d:254:13
    frame #25: 0x0000000100988782 agora`_D2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkxC8TypeInfoZS4core6memory8BlkInfo_(this=0x0000000102f050c0, size=1632, bits=2, ti=0x0000000000000000) at gc.d:417:9
    frame #26: 0x000000010098beab agora`_DThn16_2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkxC8TypeInfoZS4core6memory8BlkInfo_ at gc.d:407:13
    frame #27: 0x000000010099877a agora`gc_qalloc(sz=1632, ba=2, ti=0x0000000000000000) at proxy.d:179:9
    frame #28: 0x000000010097b0fa agora`_D4core6memory2GC6qallocFNaNbmkxC8TypeInfoZSQBqQBo8BlkInfo_(sz=1632, ba=2, ti=0x0000000000000000) at memory.d:524:9
    frame #29: 0x00000001007cba14 agora`_D3std5array__T8AppenderTAyaZQo13ensureAddableMFmZ9__lambda3MFNaNbNeZS4core6memory8BlkInfo_ at array.d:3386:39
    frame #30: 0x00000001007cb6e2 agora`_D3std5array__T8AppenderTAyaZQo13ensureAddableMFNaNbNfmZv(this=0x00000001226fed00, nelems=1) at array.d:3386:13
    frame #31: 0x00000001001b1956 agora`_D3std5array__T8AppenderTAyaZQo__T3putTaZQhMFNaNbNfaZv + 38
    frame #32: 0x000000010079e458 agora`_D4vibe4data4json__T15writeJsonStringTS3std5array__T8AppenderTAyaZQoVbi0ZQCbFNfKQBqISQDfQDdQDb4JsonmZv + 1288
    frame #33: 0x000000010079e436 agora`_D4vibe4data4json__T15writeJsonStringTS3std5array__T8AppenderTAyaZQoVbi0ZQCbFNfKQBqISQDfQDdQDb4JsonmZv + 1254
    frame #34: 0x000000010079e436 agora`_D4vibe4data4json__T15writeJsonStringTS3std5array__T8AppenderTAyaZQoVbi0ZQCbFNfKQBqISQDfQDdQDb4JsonmZv + 1254
    frame #35: 0x000000010079e436 agora`_D4vibe4data4json__T15writeJsonStringTS3std5array__T8AppenderTAyaZQoVbi0ZQCbFNfKQBqISQDfQDdQDb4JsonmZv + 1254
    frame #36: 0x000000010079bd88 agora`_D4vibe4data4json4Json8toStringMxFNeZAya + 136
    frame #37: 0x0000000100048bb1 agora`_D4scpd5types11Stellar_SCP12SCPStatement10_pledges_t8toStringMxFNeZAya + 609
    frame #38: 0x00000001001e36e1 agora`_D4vibe4data13serialization__T18serializeValueImplTSQByQBw4json14JsonSerializerSQDaQCyQCw13DefaultPolicyZ__T21serializeValueDeducedTS4scpd5types11Stellar_SCP12SCPStatement10_pledges_tZQCyFNfKQFkKQClZv + 17
        frame #39: 0x00000001001e2919 agora`_D4vibe4data13serialization__T18serializeValueImplTSQByQBw4json14JsonSerializerSQDaQCyQCw13DefaultPolicyZ__T14serializeValueTS4scpd5types11Stellar_SCP12SCPStatement10_pledges_tZQCrFNfKQFdKQClZv + 9
    frame #40: 0x00000001001e263d agora`_D4vibe4data13serialization__T18serializeValueImplTSQByQBw4json14JsonSerializerSQDaQCyQCw13DefaultPolicyZ__T21serializeValueDeducedTS4scpd5types11Stellar_SCP12SCPStatementZQCmFNfKQEyKQBzZv + 317
    frame #41: 0x00000001001e21d9 agora`_D4vibe4data13serialization__T18serializeValueImplTSQByQBw4json14JsonSerializerSQDaQCyQCw13DefaultPolicyZ__T14serializeValueTS4scpd5types11Stellar_SCP12SCPStatementZQCfFNfKQErKQBzZv + 9
    frame #42: 0x00000001001e206d agora`_D4vibe4data13serialization__T18serializeValueImplTSQByQBw4json14JsonSerializerSQDaQCyQCw13DefaultPolicyZ__T21serializeValueDeducedTS4scpd5types11Stellar_SCP11SCPEnvelopeZQClFNfKQExKQByZv + 125
    frame #43: 0x00000001001e1fe9 agora`_D4vibe4data13serialization__T18serializeValueImplTSQByQBw4json14JsonSerializerSQDaQCyQCw13DefaultPolicyZ__T14serializeValueTS4scpd5types11Stellar_SCP11SCPEnvelopeZQCeFNfKQEqKQByZv + 9
    frame #44: 0x00000001001e1fd9 agora`_D4vibe4data13serialization__T19serializeWithPolicyTSQBzQBx4json14JsonSerializerSQDbQCzQCx13DefaultPolicyTS4scpd5types11Stellar_SCP11SCPEnvelopeZQElFNfKQDwKQByZv + 9
    frame #45: 0x00000001001e1fc9 agora`_D4vibe4data13serialization__T9serializeTSQBoQBm4json14JsonSerializerTS4scpd5types11Stellar_SCP11SCPEnvelopeZQDbFNfKQCxKQByZv + 9
    frame #46: 0x00000001001e1fa2 agora`_D4vibe4data13serialization__T9serializeTSQBoQBm4json14JsonSerializerTS4scpd5types11Stellar_SCP11SCPEnvelopeZQDbFNfKQBuZSQEpQEnQDb4Json + 66
    frame #47: 0x00000001001e1f54 agora`_D4vibe4data4json__T15serializeToJsonTS4scpd5types11Stellar_SCP11SCPEnvelopeZQCfFNfKQBuZSQDjQDhQDf4Json + 20
    frame #48: 0x00000001001c999f agora`_D4vibe3web4rest__T19executeClientMethodTC5agora3api9Validator3APIVii1S_DQCtQCrQCq__T19RestInterfaceClientTQCoZQBa9__mixin2215receiveEnvelopeMFNfS4scpd5types11Stellar_SCP11SCPEnvelopeZ8envelopeQBwZQGwMFNfMKxSQHyQHw8internalQIe6common__T13RestInterfaceTQIdZQuMDFNfCQKc4http6client17HTTPClientRequestZvMDFNfQBqMCQLw4core6stream11InputStreamZvZv + 399
    frame #49: 0x00000001001c97fb agora`_D4vibe3web4rest__T19RestInterfaceClientTC5agora3api9Validator3APIZQBw9__mixin2215receiveEnvelopeMFNfS4scpd5types11Stellar_SCP11SCPEnvelopeZv + 123
    frame #50: 0x00000001000b50e5 agora`_D5agora7network13NetworkClientQp__T14attemptRequestS_DQCb3api9Validator3API15receiveEnvelopeMFNfS4scpd5types11Stellar_SCP11SCPEnvelopeZvVEQFhQFeQEzQFc5Throwi0VE5ocean4util3log7ILoggerQi5Leveli0TCQHmQFlQFkQFdTQEiZQGvMFNbQzKQEwAyakZv + 133
    frame #51: 0x00000001000b505a agora`_D5agora7network13NetworkClientQp12sendEnvelopeMFNbS4scpd5types11Stellar_SCP11SCPEnvelopeZ9__lambda2MFZv + 58
    frame #52: 0x00000001006a8734 agora`_D4vibe4coreQf8setTimerFNbSQu4time8DurationDFZvbZ9__lambda4MFNbNfZv + 20
    frame #53: 0x000000010020199c agora`_D4vibe4core4task12TaskFuncInfo__T3setTDFNfZvZQmMFKQmZ12callDelegateFKSQCrQCpQCnQClZv + 124
    frame #54: 0x00000001006eb212 agora`_D4vibe4core4task12TaskFuncInfo4callMFZv + 18
    frame #55: 0x00000001006eab3b agora`_D4vibe4core4task9TaskFiber3runMFNbZv + 523
    frame #56: 0x000000010097ea7f agora`_D4core6thread7context8Callable6opCallMFZv(this=0x0000000000000002) at context.d:46:17
    frame #57: 0x000000010097ec0e agora`_D4core6thread5fiber5Fiber3runMFZv(this=0x000000010403c200) at fiber.d:1951:9
    frame #58: 0x000000010097eb43 agora`fiber_entryPoint at fiber.d:190:13

As you can see, some random code somewhere triggers a GC allocate (frame #27, doesn't matter where the allocation comes from), which triggers a collection (frame #20), which collects a timer (frame #15) which calls LoopTimeoutTimerDriver.releaseRef (we're on POSIX). Inside this function, there's a call to AA.remove: https://github.com/vibe-d/eventcore/blob/5e060d2d0b66b8462f1a7b6e05eee17b98e57d49/source/eventcore/drivers/timer.d#L200

Well, that's a bad idea, because it can allocate...

s-ludwig commented 3 years ago

Even though it is supposed to be, eventcore is currently not fully @nogc. There are still a few uses of built-in AAs, which should be replaced by a custom hash map implementation that doesn't use the GC. I didn't do it, yet, just because I didn't want to have vibe.d's HashMap implementation lying around in a third place, but at the same time didn't feel like putting it into yet another container library or adding a dependency to one just for this purpose. But I guess ultimately some trade-off needs to be made here.

AndrejMitrovic commented 3 years ago

How do we work around this? I don't think using GC.disable() is an option for us..

Geod24 commented 3 years ago

Upstream fix: https://github.com/dlang/druntime/pull/3301 But that will only work with v2.095.0+

anon17 commented 3 years ago

How do we work around this?

Normally periodic tasks are resources, so you would store the timer somewhere, where you would be able to dispose it. Or self-dispose in the handler.