pjsip / pjproject

PJSIP project
http://www.pjsip.org
GNU General Public License v2.0
2.08k stars 788 forks source link

Assertion failure in `pj_ice_sess_add_cand` when clients use different component counts #4167

Closed yikuo123 closed 6 days ago

yikuo123 commented 1 week ago

Describe the bug

In debug builds, if two ICE clients are initialized with different component counts, the client with the higher component count crashes after a few minutes after the connection process. This issue occurs due to an assertion failure in the pj_ice_sess_add_cand function.

Steps to reproduce

  1. Create an ICE instance (A) using pj_ice_strans_create with a component count of 2, and then call pj_ice_strans_init_ice.
  2. Create another ICE instance (B) using pj_ice_strans_create with a component count of 1, and then call pj_ice_strans_init_ice.
  3. Exchange ICE candidates between A and B.
  4. Both A and B call pj_ice_strans_start_ice simultaneously.
  5. Wait for a few minutes.
  6. When the STUN-mapped address for component 2 of instance A changes, an assertion failure occurs, causing instance A to crash.

PJSIP version

2.14.1

Context

Android NDK r27

Log, call stack, etc

17:09:36.451 stuntp0x7f98d6f100  .RX 68 bytes STUN message from x.x.x.x:3478:
--- begin STUN message ---
STUN Binding success response
Hdr: length=48, magic=2112a442, tsx_id=58ec944a7ccd1f29000258ba
Attributes:
MAPPED-ADDRESS: length=8, IPv4 addr=x.x.x.x:29358
Attr 0x802b: length=8
Attr 0x802c: length=8
XOR-MAPPED-ADDRESS: length=8, IPv4 addr=x.x.x.x:29358
--- end of STUN message ---
17:09:36.451 stuntp0x7f98d6f100  .STUN mapped address found/changed: x.x.x.x:29358

../src/pjnath/ice_session.c:761: pj_status_t pj_ice_sess_add_cand(pj_ice_sess *, unsigned int, unsigned int, pj_ice_cand_type, pj_uint16_t, const pj_str_t *, const pj_sockaddr_t *, const pj_sockaddr_t *, const pj_sockaddr_t *, int, unsigned int *): assertion "comp_id <= ice->comp_cnt" failed
yikuo123 commented 1 week ago

This problem seems similar to the one described in https://github.com/pjsip/pjproject/issues/2982

sauwming commented 1 week ago

@nanangizz, this seems related to Trickle ICE. The assertion occurred here (stun_sock.c sess_on_request_complete()->ice_strans.c stun_on_status() ) https://github.com/pjsip/pjproject/blob/57acd3b1306b653fc185a73c3d659300ebe6c4f5/pjnath/src/pjnath/ice_strans.c#L2538-L2540

I'm not quite sure what the proper fix is though. a. Do we need to add check before adding candidate to the trickle ICE? such as:

b. Or we should just ignore such STUN message, by adding check: if (comp->comp_id > ice_st->comp_cnt) return/break;

nanangizz commented 1 week ago

Without reproducing the issue, I'd vote for approach "a". However, instead of checking nego completion or STUN op, I'd go with verifying component count (as asserted by pj_ice_sess_add_cand()), e.g:

if (pj_ice_strans_has_sess(ice_st) &&
    comp->comp_id <= pj_ice_strans_get_running_comp_cnt(ice_st) )
{
    ...
}
sauwming commented 1 week ago
 * Get the current/running component count. If ICE negotiation has not
 * been started, the number of components will be equal to the number
 * when the ICE stream transport was created. Once negotiation been
 * started, the number of components will be the lowest number of 
 * component between local and remote agents.

Since A has 2 components and B has 1, pj_ice_strans_get_running_comp_cnt() will return 1, and the block will still be executed?

nanangizz commented 1 week ago

Ups, it should be checking the component id agains the running_comp_cnt() (updated the pseudo code).

yikuo123 commented 6 days ago

comp->comp_id < pj_ice_strans_get_running_comp_cnt(ice_st)

I think it should be comp->comp_id <= pj_ice_strans_get_running_comp_cnt(ice_st), because the comp_id is start from 1.

And there is also an assertion failure in idecemo.c. It should be icedemo.opt.comp_cnt > PJ_ICE_MAX_COMP

https://github.com/pjsip/pjproject/blob/e6eb88eefe1ec4210c341cea4a7e94ba0da21100/pjsip-apps/src/samples/icedemo.c#L1220C1-L1223C14