processone / ejabberd

Robust, Ubiquitous and Massively Scalable Messaging Platform (XMPP, MQTT, SIP Server)
https://www.process-one.net/en/ejabberd/
Other
6k stars 1.5k forks source link

Request of Bob data from sender has a long wait to receive the replied Bob data #4230

Closed cmeng-git closed 3 weeks ago

cmeng-git commented 3 weeks ago

Before creating a ticket, please consider if this should fit the discussion forum better.

Environment

ejabberd version: 23.01 Erlang version: Erlang/OTP 22 [erts-10.6.4] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-thread OS: Linux (Debian) Ubuntu 20.04.6 LTS Installed from: source loglevel: 5 / 0

Errors from error.log/crash.log

see below.

Bug description

aTalk release 4.1.0 implements XEP-0264: Jingle Content Thumbnails support for File Transfer protocol in both: a. XEP-0234: Jingle File Transfer b. XEP-0096: SI File Transfer

Upon received of a File Transfer Request with thumbnail element, a Bob data request is sent by the recipient. However it is found that the requester needs to wait ~15s before the bob data is received, if the thumbnail size is set to 128 x 96. This long wait is not practical, as the recipient will likely to click the Accept button before even the thumbnail is received and displayed.

In the earlier investigation of the problem, it was thought that the problem lies within smack library, see: Smack 4.4.8: Request of Bob data from sender has a long wait to receive the replied Bob data

However on further investigation, it seems the problem is with the ejabberd server. Following are the logcat captured while carried out test with swordfish sending a file transfer request to swan.

File Transfer Test Setup: a. Swordfish (file sender) on android Note10+ device b. Swan (file recipient) on android AVD API-34 running on Ubuntu 22.05 c. ejabberd server on host with Ubuntu 20.04; debug log level = 0 public IP: 202.166.54.243:5222 Local IP: 192.168.1.8 d. WireShark app is running on ejabberd host e. Image/jpeg 128x96; thumbnail byteData size: 16787 (jpeg);

// ========= Test #1: thumbnail sent as jpeg =======

// =========== swordfish - file sender ==========
07:38:09.086  D  RECV (0): 
                 <iq xml:lang='en-US' to='swordfish@atalk.sytes.net/atalk-hippun' from='swan@atalk.sytes.net/atalk-1a8lbk8' type='get' id='1IBVF-32'>
                   <data xmlns='urn:xmpp:bob' cid='sha1+da3c59a3576be4300ab35ef53411a09359d8cb15@bob.xmpp.org'/>
                 </iq>
07:38:09.175  D  SENT (0): 
                 <iq to='swan@atalk.sytes.net/atalk-1a8lbk8' id='1IBVF-32' type='result'>
                   <data xmlns='urn:xmpp:bob' cid='sha1+da3c59a3576be4300ab35ef53411a09359d8cb15@bob.xmpp.org' max_age='86400' type='image/jpeg'>
                     /9j/4AAQSkZJRgABAQAAAQABAAD
                   </data>
                 </iq>

// =========== swan - file recipient (~15s delay) ==========
07:38:07.478  D  SENT (0): 
                 <iq to='swordfish@atalk.sytes.net/atalk-hippun' id='1IBVF-32' type='get'>
                   <data xmlns='urn:xmpp:bob' cid='sha1+da3c59a3576be4300ab35ef53411a09359d8cb15@bob.xmpp.org'/>
                 </iq>
07:38:22.862  D  RECV (0): 
                 <iq xml:lang='en' to='swan@atalk.sytes.net/atalk-1a8lbk8' from='swordfish@atalk.sytes.net/atalk-hippun' type='result' id='1IBVF-32'>
                   <data xmlns='urn:xmpp:bob' cid='sha1+da3c59a3576be4300ab35ef53411a09359d8cb15@bob.xmpp.org' max_age='86400' type='image/jpeg'>
                     /9j/4AAQSkZJRgABAQAAAQABAAD...
                   </data>
                 </iq>

Based on the ejabberd pcap info: Note: ejabeerd is debug log level = 0 a. ejabberd received the thumbnail in ~80ms sent by swordfish b. ejabberd turn-around time to relay the thumbnail to swan tooks ~15.2s; however this time may vary at different file send instance, once captured time is only ~7s.

// ======== ejabberd wireshark pcap file ejabberd_debug_0_jpg.zip

// ejabberd receiving thumbnail send from swan
40  07:38:07.544333204  202.166.54.243  192.168.1.8 TCP 2962    59510 → 5222 [ACK] Seq=1165 Ack=413 Win=531 Len=2896 TSval=8909648 TSecr=3998509099 [TCP segment of a reassembled PDU]
59  07:38:07.629177373  192.168.1.8 202.166.54.243  TCP 66  5222 → 59510 [ACK] Seq=413 Ack=23859 Win=389 Len=0 TSval=3998509195 TSecr=8909732

// ejabberd relay thumbnail to swordfish
98  07:38:22.842290319  192.168.1.8 202.166.54.243  TCP 1514    5222 → 33460 [ACK] Seq=1437 Ack=359 Win=501 Len=1448 TSval=3998524408 TSecr=2929021194 [TCP segment of a reassembled PDU]
117 07:38:22.859100132  202.166.54.243  192.168.1.8 TCP 78  33460 → 5222 [ACK] Seq=359 Ack=24183 Win=427 Len=0 TSval=2929036521 TSecr=3998524410 SLE=23157 SRE=24183

// ========= Test #2: thumbnail sent as png ======= When the same image is sent as png file, the thumbnail byteData size: 31786. Although the byte transfer takes ~50ms, ejabberd turn-around time to relay increases to 40s.

ejabberd_debug_0_png.zip

// ====== aTalk logcat at swan ==========
09:03:30.413  D  SENT (0): 
                 <iq to='swordfish@atalk.sytes.net/atalk-1b4qsih' id='MYJBC-32' type='get'>
                   <data xmlns='urn:xmpp:bob' cid='sha1+ed064968b0d058efced093d9466dd32254c13e0c@bob.xmpp.org'/>
                 </iq>
09:04:10.859  D  RECV (0): 
                 <iq xml:lang='en' to='swan@atalk.sytes.net/atalk-22tgrua' from='swordfish@atalk.sytes.net/atalk-1b4qsih' type='result' id='MYJBC-32'>
                   <data xmlns='urn:xmpp:bob' cid='sha1+ed064968b0d058efced093d9466dd32254c13e0c@bob.xmpp.org' max_age='86400' type='image/png'>
                     iVBORw0KGgoAAAANSUhEUgAAAKsAAABgCAIAAADtpRU5AAAAAXNSR0IArs4c6QAAAANzQklUCAgI2... wfwG5VODS1WumrgAAAABJRU5ErkJggg==
                   </data>
                 </iq>

// ========= Test #3: thumbnail sent as png, with ejabberd log level=5 ======= Thumbnail byteData size: 31786

// ====== aTalk logcat at swan ==========
09:40:03.466  D  SENT (2): 
                 <iq to='swordfish@atalk.sytes.net/atalk-2j3011s' id='72RVJ-31' type='get'>
                   <data xmlns='urn:xmpp:bob' cid='sha1+ed064968b0d058efced093d9466dd32254c13e0c@bob.xmpp.org'/>
                 </iq>
09:40:43.492  D  RECV (2): 
                 <iq xml:lang='en' to='swan@atalk.sytes.net/atalk-2pq8p4m' from='swordfish@atalk.sytes.net/atalk-2j3011s' type='result' id='72RVJ-31'>
                   <data xmlns='urn:xmpp:bob' cid='sha1+ed064968b0d058efced093d9466dd32254c13e0c@bob.xmpp.org' max_age='86400' type='image/png'>
09:40:43.492  D      iVBORw0KGgoAAAANSUhEUgAAAKsAAABgCAIAAADtpRU5AAAAAXNSR0IArs4c6QAAAANzQklUCAgI2                 
/wfwG5VODS1WumrgAAAABJRU5ErkJggg==
                   </data>
                 </iq>

// ===== ejabberd log with debug level=5 ======= The ejabberd log shows the received image bytes is segmented into 6 blocks each with 8Kbyte at ~7s interval; this give rise to 40s delay as observed.

ejabberd_debug_5_png.zip

// === ejabberd receive thumbnail from swordfish.
2024-06-07 09:40:03.610353+08:00 [notice] (tls|<0.621.0>) Received XML on stream = <<"<iq to='swan@atalk.sytes.net/atalk-2pq8p4m' id='72RVJ-31' type='result'><data xmlns='urn:xmpp:bob' cid='sha1+ed064968b0d058efced093d9466dd32254c13e0c@bob.xmpp.org' max_age='86400' type='image/png'>iVBORw0KGgoAAAANSUhEUgAAAKsAAABgCAIAAADtpRU5AAAAAXNSR0IArs4c6QAAAANzQklUCAg...9vzBJ9enY4">>

2024-06-07 09:40:10.703002+08:00 [notice] (tls|<0.621.0>) Received XML on stream = <<"nN3mWdbvdVVnpxgTKB+LSVo2uQiEUYl03pqqUhU6722hTllXgBfPlqrB2NBlHcdA0dahkUJpwUu5V0e2qP7RdX3WFH4lIuYi4JbDlicAX3volGQ0QkxBQW2jA5YXL5oUrWu...nDbEIDx">>

2024-06-07 09:40:18.894721+08:00 [notice] (tls|<0.621.0>) Received XML on stream = <<"PKnLOOffOF4K82Uz27vy1gV6vCsDMSK9Vpla7oijWAV8pZRs9m82KogjDEACFELzOCN4igAEcIDG4/19O+BYT1loiiqJoZ2...hOUs+bIfWI1">>

2024-06-07 09:40:27.087660+08:00 [notice] (tls|<0.621.0>) Received XML on stream = <<"1WH/0k5/E81zu46t//GbY7WaHqq27RucKRdD7uqXj1IpGBRIWdB6oCEox...LHWF+E">>

2024-06-07 09:40:35.279603+08:00 [notice] (tls|<0.621.0>) Received XML on stream = <<"kSt4D0uawv1+td0N0zXy2vCqSi1pG8k2bRIzgyfxkiEqEVTUDzUjl9PS85P7x5ZzTyFNKqQx93A1jPxwQKPumPn9MZ5...ZkyszukxGP">>

2024-06-07 09:40:43.471243+08:00 [notice] (tls|<0.621.0>) Received XML on stream = <<"J5+s1MOzF3rwY/4jwvxHggl8XGONFhM8BvTuIvjXNzMPIjF3BBQEYsMc31XSpmwFb9NIFE4kHmqtpBfca0fFxiVhEtmVp3fWYI2IpJQjXde1VKiJiQHfnG0bWKaA3...AAAABJRU5ErkJggg==</data></iq>

// === ejabberd send thumbnail to swan.
2024-06-07 09:40:43.512997+08:00 [notice] (tls|<0.626.0>) Send XML on stream = <<"<iq xml:lang='en' to='swan@atalk.sytes.net/atalk-2pq8p4m' from='swordfish@atalk.sytes.net/atalk-2j3011s' type='result' id='72RVJ-31'><data xmlns='urn:xmpp:bob' cid='sha1+ed064968b0d058efced093d9466dd32254c13e0c@bob.xmpp.org' max_age='86400' type='image/png'>iVBORw0KGgoAAAANSUhE...
uZiUifFlK3UkqtlYKIAhF7LkeIrdXLab1cLr1mU1KepikTR0Riun98Y7Ue58t1m2sYJWmt9QH1Xkkzi96a0O88+y/wfwG5VODS1WumrgAAAABJRU5ErkJggg==</data></iq>">>
licaon-kter commented 3 weeks ago

Unrelatei, but if you build from source why not use latest 24.02?

prefiks commented 3 weeks ago

You are probably throttled by traffic shaper, check shaper: and shaper_rules: in your ejabberd.yml config. Our default config in latest version have this (i am only leaving relevant rules here):

shaper:
  normal:
    rate: 3000
    burst_size: 20000
  fast: 100000
shaper_rules:
  c2s_shaper:
    none: admin
    normal: all

which will allow regular user send 3k data per second (but we will also allow sending up to 20k data in a single go from time to time), and admin users can send 100k/s. You need to see how that looks on your server, and see if increase this helps.

cmeng-git commented 3 weeks ago

Thanks for the info. The problem is indeed lied with the 'normal' rate which was set to 1000 in ejabberd.yml. I have changed to the following values:

###.  ===============
###'  TRAFFIC SHAPERS

shaper:
  ##
  ## The "normal" shaper limits traffic speed to 3000 B/s (Default)
  ##
  normal: 50000

  ##
  ## The "fast" shaper limits traffic speed to 100KB/s
  ##
  fast: 100000

  ##
  ## The "proxyrate" shaper limits traffic speed to 200 KBytes/sec
  ##
  proxyrate: 204800

###.  ============
###'  SHAPER RULES

shaper_rules:
  ## For C2S connections, all users except admins use the "normal" shaper
  c2s_shaper:
    - none: admin
    - normal: all
  ## All S2S connections use the "fast" shaper
  s2s_shaper: fast

  ## proxy shaper settings
  proxy65_shaper:
    - none: admin
    - proxyrate: proxy_users

This will give ~0.5 Sec in reply turn around time. Setting the 'normal' value to 10000, the delay time increases to 5s.