jookies / jasmin

Jasmin - Open source SMS gateway
http://jasminsms.com
Other
991 stars 540 forks source link

DLRMapNotFound : Issue with DLR in Jasmin #1123

Closed hadpro24 closed 9 months ago

hadpro24 commented 11 months ago

Hello, I have been using Jasmin for a few months, but recently, I changed my mobile SMPP provider and reconfigured my access with the following settings:

Host: xxxxxx Port: xxxx Username: xxxx Password: xxxx System type: ESME Src_ton: 2 Submit_throughput: 150 After the configuration, I added an interceptor containing this Python code:


"This script will force the sending of a message upon DLR request"
from smpp.pdu.pdu_types import RegisteredDeliveryReceipt, RegisteredDelivery

routable.pdu.params['registered_delivery'] = RegisteredDelivery(
    "Receipt of delivery registered. SMSC_DELIVERY_RECEIPT_REQUESTED")

However, I am encountering issues with DLR. Some DLRs are delivered successfully without any problems, but others encounter the following error:

[msgid: 913121205] (retries: 1/2) DLRMapNotFound: You received a DLR for an unknown message ID: 0913121205 (coded: 913121205)

I tried changing the dlr_msgid parameter to (1, 2), but it doesn't seem to resolve the issue.

I also performed a tcpdump with wireshark to verify the push TCP of the provider's responses, and the responses are reaching the server correctly.

I would greatly appreciate your assistance in resolving this DLR issue. Thank you for your help.

magojr commented 11 months ago

Hi hardpro24, in a previous version of jasmin, I investigate this issue. In my case there was 2 factors involving in this issue: 1) The parameter dlr_expiry that means "how much I should consume memory in Redis waiting for a DLS?" In my case it was 48hours and i was sending an high volume of SMS, so it bring to a constant increase of used memory, I lowered it ti 8hours (note that it's always expressed in seconds) but I was receiving DLR even after 8 hours and they generate DLRMapNotFound. I bring back it to 48hours and i found (rerely) some DLR come also after 48hours so... it's a kind of choose. As you don't specify any period for the DLR, it may or may not be the point.

2) Using different upstream providers I found DLRs come back without any message sent. It's quite difficult to diagnose this, you may think the record in redis is not present because of a bug but i found them after weeks without sending and i investigate over... there was an issue in the configuration of one of the upstream providers.

Not sure if these case may help you but... give tham a check :-)

farirat commented 11 months ago

Please attach a pcap file (from tcpdump) and jasmin’s log snapshot in same time slot where i can find the missed dlr events.

Pay attention to data privacy.

hadpro24 commented 11 months ago

Hi hardpro24, in a previous version of jasmin, I investigate this issue. In my case there was 2 factors involving in this issue:

  1. The parameter dlr_expiry that means "how much I should consume memory in Redis waiting for a DLS?" In my case it was 48hours and i was sending an high volume of SMS, so it bring to a constant increase of used memory, I lowered it ti 8hours (note that it's always expressed in seconds) but I was receiving DLR even after 8 hours and they generate DLRMapNotFound. I bring back it to 48hours and i found (rerely) some DLR come also after 48hours so... it's a kind of choose. As you don't specify any period for the DLR, it may or may not be the point.
  2. Using different upstream providers I found DLRs come back without any message sent. It's quite difficult to diagnose this, you may think the record in redis is not present because of a bug but i found them after weeks without sending and i investigate over... there was an issue in the configuration of one of the upstream providers.

Not sure if these case may help you but... give tham a check :-)

What was the configuration problem with provider, can you give me more details? I also think to explore this track, I don't have a memory problem on the other hand, the use of memory is rather basic.

hadpro24 commented 11 months ago

Please attach a pcap file (from tcpdump) and jasmin’s log snapshot in same time slot where i can find the missed dlr events.

Pay attention to data privacy.

Can I share it with you privately?

magojr commented 11 months ago

What was the configuration problem with provider, can you give me more details? I also think to explore this track, I don't have a memory problem on the other hand, the use of memory is rather basic.

I don't know, the upstream provider was a telecom company, they manage their misconfiguration stopping to send me DLRs that should be sent to other routes...

hadpro24 commented 11 months ago

What was the configuration problem with provider, can you give me more details? I also think to explore this track, I don't have a memory problem on the other hand, the use of memory is rather basic.

I don't know, the upstream provider was a telecom company, they manage their misconfiguration stopping to send me DLRs that should be sent to other routes...

@magojr When I look at the id on redis here is what I have.

The mistake : 2023-08-08 14:13:14 ERROR 1 [msgid:648402030] (retrials: 1/2) DLRMapNotFound: Got a DLR for an unknown message id: 0648402030 (coded:648402030) 2023-08-08 14:13:24 ERROR 1 [msgid:648402030] (final) DLRMapNotFound: Got a DLR for an unknown message id: 0648402030 (coded:648402030)

In redis: 127.0.0.1:6379> get "queue-msgid:648402030" (nil)

I also wonder why it adds '0' in front of the id as follows: 0648402030 (coded:648402030)

hadpro24 commented 11 months ago

@magojr I also happen to get this error, I have to restart the complemenet services with (docker compose down && docker compose restart) so that I can send sms again.

txamqp.client.ChannelClosed: Method(name=close, id=40) (406, 'PRECONDITION_FAILED - delivery acknowledgment on channel 1 timed out. Timeout value used: 86400000 ms. This timeout value can be configured, see consumers doc guide to learn more', 0, 0) content = None

DDo you know what would be the cause of this problem? @farirat

farirat commented 11 months ago

@hadpro24 time between log snapshot and the pcap file is mismatching: last log line at 13:43 and pcap starts at 13:44 ... I can't correlate pdus. As for txamqp timeout error, is jasmin/redis/rabbit under any throughput pressure while having default configs (no tuning) and low profile hardware config ?

hadpro24 commented 11 months ago

@farirat Alright I'll try to capture snapshot them at the same time.

I use the default configurations for RabbitMQ and Reids, how could I prevent it from raising an exception that could prevent me from sending another message?

hadpro24 commented 11 months ago

@farirat I see the relative code in this file dlr.py. What is connector_type ?

q = yield self.redisClient.hgetall("queue-msgid:%s" % msgid)
if len(q) != 2 or 'msgid' not in q or 'connector_type' not in q:
    raise DLRMapNotFound('Got a DLR for an unknown message id: %s (coded:%s)' % (pdu_dlr_id, msgid))

submit_sm_queue_id = q['msgid']
connector_type = q['connector_type']

# Get dlr and ensure it's sc (source_connector) is same as q['connector_type']
dlr = yield self.redisClient.hgetall("dlr:%s" % submit_sm_queue_id)
if dlr is None or len(dlr) == 0:
    raise DLRMapNotFound('Got a DLR for an unknown message id: %s (coded:%s)' % (pdu_dlr_id, msgid))
farirat commented 11 months ago

@hadpro24 the issue is related to multi-part messages (long messages):

  1. If you send part1 and part2, jasmin will wait for part2's dlr to mark the message delivery status, it will ignore part1's dlr,
  2. your provider is sending you part1's dlr only, i'm not seeing any trace of part2 dlr,
hadpro24 commented 11 months ago

@farirat Is it possible to configure jasmin to only consider the first part1? This problem is also related to the timeout problems I have with rabbitmq? delivery acknowledgment on channel 1 timed out ?

farirat commented 11 months ago

no and no. you need to ask for delivery reports for all of the delivered parts from your provider.

hadpro24 commented 11 months ago

no and no. you need to ask for delivery reports for all of the delivered parts from your provider.

Thank you. last thing. I got this error for sms multipart : smpp.pdu.error.SMPPRequestTimoutError: Request timed out after 120 secs

What does it mean?

WEBudoGT commented 10 months ago

This looks similar to the issue I'm having with a DLR for messages with ID starting with zeroes. In my case they are hexadecimal (I.E. 00AB12CD23) and the error message says (Coded AB12CD23). Did you find a way to solve this?

Ref: #1129

hadpro24 commented 10 months ago

I doest find solution about Request timed and Problem with RabbitMQ... so now I use Kannel and it's work fine and that help me to increase cpu performance for massive sending.