Open jimying opened 2 weeks ago
Great! I assume you've tested this and it does fix the issue :)
I think we also need to run under somekind of stress test, e.g: using ioqueue test in pjlib-test, to make sure all memory pools (of pending ops) are properly released. Note that the after an ioqueue key is unregistered, the key will be put into the closing-key-list and soon into the free-key-list to be reused by another socket. We need to make sure that all pending op has been freed before the key is freed & reused.
Next, perhaps we can apply a little bit optimization, e.g: instead of mem-pool for each pending-op, perhaps mem-pool per ioqueue-key to avoid multiple alloc+free for multiple pending-op, using same mechanism as ioqueue key (employing additional list for keeping unused pending-op instances to be reused later).
Note:
this only fix the issue when PJ_IOQUEUE_HAS_SAFE_UNREG=1;
PJ_IOQUEUE_HAS_SAFE_UNREG=0 still has memory error
Note: this only fix the issue when PJ_IOQUEUE_HAS_SAFE_UNREG=1; PJ_IOQUEUE_HAS_SAFE_UNREG=0 still has memory error
When PJ_IOQUEUE_HAS_SAFE_UNREG==1
, the key won't be reused for 500ms (configurable via PJ_IOQUEUE_KEY_FREE_DELAY
), this is a bit risky actually, e.g: on high CPU load, the cancellation may take longer?
Tried to run pjlib-test
with this PR patch on VS2005 and pool debugging enabled (so pool uses normal malloc/free()
, by setting PJ_POOL_DEBUG
to 1
in config_site.h
), I got an assertion:
_wassert(const wchar_t * expr=0x004d1e90, const wchar_t * filename=0x004d1cf0, unsigned int lineno=588) Line 212 C
pj_ioqueue_register_sock2(pj_pool_t * pool=0x023a29b8, pj_ioqueue_t * ioqueue=0x023a2a2c, long sock=388, pj_grp_lock_t * grp_lock=0x00000000, void * user_data=0x00000002, const pj_ioqueue_callback * cb=0x0019fb54, pj_ioqueue_key_t * * key=0x0019fbec) Line 588 + 0x2c bytes C
pj_ioqueue_register_sock(pj_pool_t * pool=0x023a29b8, pj_ioqueue_t * ioqueue=0x023a2a2c, long sock=388, void * user_data=0x00000002, const pj_ioqueue_callback * cb=0x0019fb54, pj_ioqueue_key_t * * key=0x0019fbec) Line 665 + 0x1f bytes C
unregister_test(const pj_ioqueue_cfg * cfg=0x0019fd34) Line 554 + 0x23 bytes C
udp_ioqueue_test_imp(const pj_ioqueue_cfg * cfg=0x0019fd34) Line 1190 + 0x9 bytes C
udp_ioqueue_test() Line 1255 + 0x9 bytes C
test_inner() Line 171 + 0x2a bytes C
test_main() Line 245 + 0x5 bytes C
Not sure if this is the same issue, but this assertion does not happen when using ioqueue select.
@nanangizz no this patch, Is there this assert?
@nanangizz no this patch, Is there this assert?
Yes, same assert without this patch.
Tried to run
pjlib-test
with this PR patch on VS2005 and pool debugging enabled (so pool uses normalmalloc/free()
, by settingPJ_POOL_DEBUG
to1
inconfig_site.h
), I got an assertion:_wassert(const wchar_t * expr=0x004d1e90, const wchar_t * filename=0x004d1cf0, unsigned int lineno=588) Line 212 C pj_ioqueue_register_sock2(pj_pool_t * pool=0x023a29b8, pj_ioqueue_t * ioqueue=0x023a2a2c, long sock=388, pj_grp_lock_t * grp_lock=0x00000000, void * user_data=0x00000002, const pj_ioqueue_callback * cb=0x0019fb54, pj_ioqueue_key_t * * key=0x0019fbec) Line 588 + 0x2c bytes C pj_ioqueue_register_sock(pj_pool_t * pool=0x023a29b8, pj_ioqueue_t * ioqueue=0x023a2a2c, long sock=388, void * user_data=0x00000002, const pj_ioqueue_callback * cb=0x0019fb54, pj_ioqueue_key_t * * key=0x0019fbec) Line 665 + 0x1f bytes C unregister_test(const pj_ioqueue_cfg * cfg=0x0019fd34) Line 554 + 0x23 bytes C udp_ioqueue_test_imp(const pj_ioqueue_cfg * cfg=0x0019fd34) Line 1190 + 0x9 bytes C udp_ioqueue_test() Line 1255 + 0x9 bytes C test_inner() Line 171 + 0x2a bytes C test_main() Line 245 + 0x5 bytes C
Not sure if this is the same issue, but this assertion does not happen when using ioqueue select.
I found the reason: key double unregister.
Fixed: when key->closing = 1, not unregister.
now test passed.
Thanks @jimying .
Honestly I haven't got a chance to reproduce the original issue and test the proposed solution. I believe you are using this ioqueue in real world, experienced the issue, and find this solution does work, is that correct?
Next, here are few notes about the proposed solution:
PJ_IOQUEUE_HAS_SAFE_UNREG
to work, so I think this mode has to be enforced somehow.pending_op
(just like pool of key), may be owned by ioqueue or by key.Also, this ioqueue has been disabled for quite sometime and some improvement in the ioqueue area may not be integrated into this ioqueue, e.g: group lock for key. So please understand that there may still be some steps required to enable this ioqueue again :)
@nanangizz i write a simple demo to reproduce the crash issue in msys2, #4172
I have tested it, in old code, it can 100% reproduce the crash.
To test new code we can git cherry-pick the demo patch to this branch.
try to fix issue #985