pjsip / pjproject

PJSIP project
http://www.pjsip.org
GNU General Public License v2.0
2.08k stars 788 forks source link

Fix audio & video conference bridge: adding port may not update port counter #4164

Closed nanangizz closed 1 week ago

nanangizz commented 1 week ago

This is to fix https://github.com/pjsip/pjproject/pull/3928#issuecomment-2484827371.

After investigation, it is caused by invalid port counter (e.g: zero or -1 while there are actually some ports added). Conference bridge uses the port counter to iterates the ports, so when it is invalid, no media flow will happen. Furthermore, such invalid port counter case may happen when a port is removed before the add port operation, including increasing the port counter, is completed. The add port operation can never be completed when no async operation is queued.

This PR fix it by:

sauwming commented 1 week ago

pjsua-test failed, and it seems genuine, already rerun it twice.

nanangizz commented 1 week ago

pjsua-test failed, and it seems genuine, already rerun it twice.

I can't reproduce it here, only tried it on Windows though, also it seems to pass Linux test.

Call stack:

3   libsystem_c.dylib                   0x0000000181304b24 _vsnprintf + 224
4   libsystem_c.dylib                   0x00000001812dd088 __snprintf_chk + 48
5   pjsua-arm-apple-darwin23.6.0        0x00000001002a7220 pj_pool_init_int + 140
6   pjsua-arm-apple-darwin23.6.0        0x00000001002a73e0 pj_pool_create_int + 392
7   pjsua-arm-apple-darwin23.6.0        0x00000001002a7d9c cpool_create_pool + 488
8   pjsua-arm-apple-darwin23.6.0        0x00000001002a6e84 pj_pool_create + 64
9   pjsua-arm-apple-darwin23.6.0        0x000000010016be60 create_conf_port + 84
10  pjsua-arm-apple-darwin23.6.0        0x000000010016bc24 pjmedia_conf_add_port + 632

Here the pj_pool_init_int() code: https://github.com/pjsip/pjproject/blob/e111ef6503f25699a5ade407e9fc92b01496e77d/pjlib/src/pj/pool.c#L180-L182 Call stack points to snprintf() and remote URI does have %:

From: <sip:I%20have%20spaces@example.net>;tag=;tag=0.13136883094899887

Maybe it is not related to the new async conf?

nanangizz commented 1 week ago

Just tried the failing Python test using this branch on MacOS (13.1/Intel), still not reproducible.

sauwming commented 1 week ago

Yes, that's strange, I also tried both tests here on my Mac and they completed successfully.

But it's most certainly because of this PR since this is the only one that failed the CI tests in https://github.com/pjsip/pjproject/actions. I have also repeated the CI tests many times over the night and the failed results are consistent.

sauwming commented 1 week ago

Your intuition is correct. The '%' causes the issue. Passing the URI as format specifier doesn't make sense: snprintf(.., ..., "sip:I%20have%20spaces@example.net", ...)

Although not sure why only in this PR it raises the error.

Perhaps we should change it to strstr(.., "%p")?

Another potential issue is that pjmedia_conf_add_port() never specifies that port_name must be NULL terminated, so we probably shouldn't call pj_pool_create(..., name->ptr)

nanangizz commented 1 week ago

Perhaps we should change it to strstr(.., "%p")?

Agree. But it may still have a problem, when the name accidentally has %p, the pool name will not be as expected. Perhaps we should also escape/replace any unintended %, but this needs to be done in the apps.

Another potential issue is that pjmedia_conf_add_port() never specifies that port_name must be NULL terminated, so we probably shouldn't call pj_pool_create(..., name->ptr)

Agree.