Closed elincognito closed 13 years ago
Yes, I know. I talked to the original author (Enrique Marcote Peña) about this issue last year.
"We implemented all SMPP for completion, but our requirement was to build a transmitter ESME (and that is mainly in what we focused on so far)."
This is why rx/trx support is not as good. And trx connections are the worst. It can easily be locked if the session calls CallbackMod:handle_operation/2 and another process calls, say, submit_sm/2. Both are synchronous.
To fix the rx/trx issue properly a big rewrite of sessions is needed. But I had this problem last year and somewhat fixed it by adding async (SHA: 8c5bf05338f7bc9f6418fa523ea160632c1568a7). So, return 'noreply' from handle_operation/2 and then call gen_esme_session:reply/2 with the reply.
If this isn't enough, have a look at https://github.com/essiene/smpp34/
Thanks for the reply iamaleksey i will try it next monday when i get back to work.
So on my gen_esme behaviour implementation callbacks i should return noreply for handle_operation, and later reply with gen_esme_session:reply/2 is that correct ? In this way the session will work in an async fashion ? And what about window width support is that "concept" implemented in oserl ?
My use cases are mostly transmitters as well, so I implement window width enforcement in my callbacks. But it definitely is something that should be supported natively by esme/mc sessions.
Hi there again and sorry to be bothering you again, i seem to be doing something wrong here is a snipet of what i've done:
[code]
handle_deliver_sm(Pdu, From = {Pid, Ref}, St) ->
spawn_link(
fun() ->
format(St#st.silent, "Deliver SM: ~p~n", [Pdu]),
Reply = {ok, []},
SeqNum = smpp_operation:get_value(sequence_number, Pdu),
io:format("Pid ~p info: ~p~n", [Pid, process_info(Pid)]),
gen_esme_session:reply(Pid, {SeqNum, Reply})
end),
{noreply, St#st{msg_id = St#st.msg_id + 1, recv = St#st.recv + 1}}.
[/code]
however this doesn't seem to be working as gen_esme_session:handle_sync_event is not being called ? >.<
Ok my misunderstanding now its : {reply, noreply, St#st{msg_id = St#st.msg_id + 1, recv = St#st.recv + 1}}.
however i still get the timeout(lock) issue :s
Based on your last comment, you are still making a gen_server call from handle_operation, thus not fixing anything. You should just return noreply and make a non-blocking cast instead of call. Here is a snippet of some working code:
%% Handle SMSC-issued deliver_sm operation.
handle_operation(Client, {deliver_sm, {_CmdId, _Status, SeqNum, Body}}) ->
gen_server:cast(Client, {handle_deliver_sm, SeqNum, Body}),
noreply;
Before anything else thanks alot for the replies !
Ok i was using the gen_esme behaviour so i changed it to do the cast and return the noreply as :
[code] %%On oserl gen_esme: handle_operation(SrvRef, {deliver_sm, Pdu}) -> gen_server:cast(SrvRef, {handle_deliver_sm, Pdu}), noreply.
handle_cast({handle_deliver_sm, Pdu}, St) ->
io:format("ST MOD : ~p~n", [St#st.mod]),
R = pack((St#st.mod):handle_deliver_sm(Pdu, self(), St#st.mod_st), St),
io:format("Pack: ~p~n", [R]),
R;
[/code]
And on my callback module (called smpp_connection) [code] handle_deliver_sm(Pdu, From, St) ->
io:format("Received PPP~n"),
pdu_submitter:buffer_pdu(Pdu),
?DEBUG_LOG("received pdu: ~p", [Pdu]),
% Reply = {ok, [{message_id, integer_to_list(St#st.msg_id)}]}, %% Reply = {ok, [{congestion_state, 1}]},
Reply = {ok, []},
SeqNum = smpp_operation:get_value(sequence_number, Pdu),
%io:format("Pid ~p info: ~p~n", [Pid, process_info(Pid)]),
gen_esme_session:reply(From, {SeqNum, Reply}),
{reply, Reply, St#st{msg_id = St#st.msg_id + 1, recv = St#st.recv + 1}}.
[/code]
doing a erlang:process_info(erlang:whereis(smpp_connection)), i see the message queue of my smpp_connection.
[code] {messages,[{'$gen_sync_all_state_event',{<0.1536.0>,
{reply,{20328,{ok,[]}}}},
{'$gen_call',{<0.1554.0>,#Ref<0.0.0.10097>},
{{submit_sm,[{short_message,[5,0,3,1,3,1,0,78,0,79|...]},
{esm_class,64},
{destination_addr,"913768304"},
{source_addr_ton,5},
{source_addr,"3838"},
{data_coding,8},
{dest_addr_ton,1},
{dest_addr_npi,1},
{source_addr_npi,1},
{registered_delivery,0},
{priority_flag,0},
{schedule_delivery_time,...},
{...}]},
[{uid,"35137F7F55AE642A8A70EC89A85AE35025EBC203"}]}},
{'$gen_cast',{handle_deliver_sm,{5,0,20330,
[{short_message,"RE 2801491 19741022"},
{sm_default_msg_id,0},
{data_coding,0},
{replace_if_present_flag,0},
{registered_delivery,0},
{validity_period,[]},
{schedule_delivery_time,[]},
{priority_flag,0},
{protocol_id,...},
{...}|...]}}},
{'$gen_cast',{handle_deliver_sm,{5,0,20332,
[{short_message,"RE 1314745 19430416"},
{sm_default_msg_id,0},
{data_coding,0},
{replace_if_present_flag,0},
{registered_delivery,0},
{validity_period,[]},
{schedule_delivery_time,[]},
{priority_flag,...},
{...}|...]}}}]}
[/code]
I am doing bad the gen_esme_session:reply ?
You seem to be using gen_esme - not gen_esme_session directly. If so then I can't help you with your issue. If not then publish the whole module in a gist - I'll have a look,
I've pasted here http://pastebin.com/Rnr7NTKF a simplified version which i have used and the error occurs in this version,
so besides this the only change i've attempted was in the gen_esme.erl code (all cloned from your repo for the nice patches you have :) )
also i've tried to change the gen_esme.erl as you've said and changed the handle_operation to
[code]
handle_operation(SrvRef, {deliver_sm, Pdu}) -> gen_server:cast(SrvRef, {handle_deliver_sm, Pdu}),
noreply.
[/code]
and added a match for it in gen_esme.erl
[code]
handle_cast({handle_deliver_sm, Pdu}, St) ->
io:format("ST MOD : ~p~n", [St#st.mod]),
R = pack((St#st.mod):handle_deliver_sm(Pdu, self(), St#st.mod_st), St),
{noreply, St};
[/code]
The reason for the use of gen_esme instead of gen_esme_session, was pretty much the persistent queues and throttling control given by gen_esme.
Once more thanks for your precisous help !
I see. You didn't really fix the problem, you just moved it. I'm afraid that adding async support to gen_esme would take more that what you've done. Can't help you here, sorry. I still suggest using gen_esme_session directly - that'll give you a lot more control and you'll still be able to implement persistent queues and throttling yourself.
iamaleksey, thanks alot for your help. I decided to go with gen_esme_session directly and implemented the most basicly possible trx session (with a gen_server since i was suspicious that the problem could rely in gen_fsm usage in gen_esme togheter with gen_fsm in gen_esme_session), and strangly for now with a stress test no lock occured, so now im going to take a look in gen_esme, to implement persistent queuing and throtling, it still remains the window with issue and how to combine that with "gen_esme" queueing. Any tip on that ? Im thinking on having the queue take messages at the desired troughtput per second, and on the gen_server have a window_width_max and current_win_width, increasing one when a submit_sm is done and decrease when i get a reply from the smsc, when the window is full just enqueue the message back on queue, and maybe pause the queue for a second or so. Do you see any possible issues on that implementation ? And once more thanks alot for your help, if by any chance the lock issue appear again i will come back to you :)
Well, if you only need to force a window size for submit_sm then it's easy. Just go with your solution. Maybe block the client user if the window is full (if submit is synchronous on that level). I used something like this as well.
But if you need to enforce a window for deliver_sm then there is little you can do with the current session implementation.
Good luck (:
I only need to enforce a window for submit_sm, Btw i have currenlty implemented the trx solution with gen_esme_session directly and until now i found no deadlock, lets hope it keeps like this.
Another topic about gen_esme_session in sourceforge mpique says that someone a spawn_link was removed from gen_esme_session and gen_mc_session handle_peer_operation, could that be possibly related ? To the deadlock issue i found ? . https://sourceforge.net/projects/oserl/forums/forum/350016/topic/3910746
Im facing an issue with a trx connection, if the incomming and outgoing (submit_sm/deliver_sm) troughtput is high enough the gen_esme server crash's with a , which takes about 2 minutes to crash, i need this for a production system for next month :s, and my SMSC provider only supports trx connections, i was using tr and rx connections with no problem with other SMSC's, also i noticed that this ony happens when receving and sending messages in simultaneous, if im only sending, or receiving this issues does not occur. Im using SMPPSim from logica as a simulator. Also oserl supports async smpp sessions ? The doc is very vague on this and also doesn't mention the window width, if its configurable, only rps.
looking at the process_info of the gen_esme connection process i notice that the server is "loked" in some operation, here follows the sasl error report and process_info
ERROR REPORT <0.160.0> 2011-04-29 14:31:25
* Generic server <0.160.0> terminating * Last message in was {'EXIT',<0.158.0>, {timeout, {gen_server,call, [<0.158.0>, {{submit_sm, [{short_message, [0,72,0,105,0,32,0,116,0,104,0,101,0,114,0,101, 0,32,0,84,0,82,0,88]}, {destination_addr,"351913768304"}, {source_addr_ton,5}, {source_addr,"7337"}, {data_coding,8}, {dest_addr_ton,1}, {dest_addr_npi,1}, {source_addr_npi,1}, {registered_delivery,0}, {priority_flag,0}, {schedule_delivery_time,[]}, {validity_period,[]}]}, [{uid, "8723648C5319FF0AB78864A5A6BD94A8F66DE233"}]}, 120000]}}} * When Server state == {st,<0.158.0>,undefined,12751,1083, {1304,83762,900153}} * Reason for termination == \ {timeout, {gen_server,call, [<0.158.0>, {{submit_sm, [{short_message, [0,72,0,105,0,32,0,116,0,104,0,101,0,114,0,101,0,32,0, 84,0,82,0,88]}, {destination_addr,"351913768304"}, {source_addr_ton,5}, {source_addr,"7337"}, {data_coding,8}, {dest_addr_ton,1}, {dest_addr_npi,1}, {source_addr_npi,1}, {registered_delivery,0}, {priority_flag,0}, {schedule_delivery_time,[]}, {validity_period,[]}]}, [{uid,"8723648C5319FF0AB78864A5A6BD94A8F66DE233"}]}, 120000]}} ok (smpp_con@arsenio-desktop)135> rb:show(2).
CRASH REPORT <0.160.0> 2011-04-29 14:31:25
Crashing process
initial_call {cl_queue_srv,init,['Argument__1']} pid <0.160.0> registered_name [] error_info {exit, {timeout, {gen_server,call, [<0.158.0>, {{submit_sm, [{short_message, [0,72,0,105,0,32,0,116,0,104,0,101,0,114,0, 101,0,32,0,84,0,82,0,88]}, {destination_addr,"351913768304"}, {source_addr_ton,5}, {source_addr,"7337"}, {data_coding,8}, {dest_addr_ton,1}, {dest_addr_npi,1}, {source_addr_npi,1}, {registered_delivery,0}, {priority_flag,0}, {schedule_delivery_time,[]}, {validity_period,[]}]}, [{uid,"8723648C5319FF0AB78864A5A6BD94A8F66DE233"}]}, 120000]}}, [{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]} ancestors [smpp_connection,smpp_connector_sup,<0.51.0>] messages [] links [] dictionary [] trap_exit true status running heap_size 17711 stack_size 24 reductions 59200658
ok (smpp_con@arsenio-desktop)136> erlang:process_info(erlang:whereis(smpp_connection)). [{registered_name,smpp_connection}, {current_function,{gen,do_call,4}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,40}, {messages,[{'$gen_call',{<0.10847.4>,#Ref<0.0.43.198005>}, {{submit_sm,[{short_message,[0,72,0,105,0,32,0,116,0,104,0|...]}, {destination_addr,"351913768304"}, {source_addr_ton,5}, {source_addr,"7337"}, {data_coding,8}, {dest_addr_ton,1}, {dest_addr_npi,1}, {source_addr_npi,1}, {registered_delivery,0}, {priority_flag,0}, {schedule_delivery_time,[]}, {validity_period,[]}]}, [{uid,"F68E1631C5756718C4D2A0F7961069BD992D208"}]}}, {'$gen_call',{<0.10848.4>,#Ref<0.0.43.198013>}, {{submit_sm,[{short_message,[0,72,0,105,0,32,0,116,0,104|...]}, {destination_addr,"351913768304"}, {source_addr_ton,5}, {source_addr,"7337"}, {data_coding,8}, {dest_addr_ton,1}, {dest_addr_npi,1}, {source_addr_npi,1}, {registered_delivery,0}, {priority_flag,0}, {schedule_delivery_time,[]}, {validity_period,...}]}, [{uid,"E87007766E3D5E5AD6BB68835CBBEEAD5FB19237"}]}}, {'$gen_call',{<0.10841.4>,#Ref<0.0.43.198016>}, {handle_deliver_sm,{5,0,22410, [{short_message,"RE 2856249 19930922"}, {sm_default_msg_id,0}, {data_coding,0}, {replace_if_present_flag,0}, {registered_delivery,0}, {validity_period,[]}, {schedule_delivery_time,[]}, {priority_flag,...}, {...}|...]}}}, {'$gen_call',{<0.10849.4>,#Ref<0.0.43.198019>}, {{submit_sm,[{short_message,[0,72,0,105,0,32,0,116|...]}, {destination_addr,"351913768304"}, {source_addr_ton,5}, {source_addr,"7337"}, {data_coding,8}, {dest_addr_ton,1}, {dest_addr_npi,1}, {source_addr_npi,1}, {registered_delivery,0}, {priority_flag,...}, {...}|...]}, [{uid,"3E0CBFEA73A8B709ACF1EE7D8FA8898E9859DD71"}]}}, {'$gen_call',{<0.10845.4>,#Ref<0.0.43.198021>}, {{submit_sm,[{short_message,[0,72,0,105,0,32,0|...]}, {destination_addr,"351913768304"}, {source_addr_ton,5}, {source_addr,"7337"}, {data_coding,8}, {dest_addr_ton,1}, {dest_addr_npi,1}, {source_addr_npi,1}, {registered_delivery,...}, {...}|...]}, [{uid,"C9352FD329A7C5FED4ECC9725BA52CE433E0DDB3"}]}}, {'$gen_call',{<0.10850.4>,#Ref<0.0.43.198036>}, {{submit_sm,[{short_message,[0,72,0,105,0,32|...]}, {destination_addr,"351913768304"}, {source_addr_ton,5}, {source_addr,"7337"}, {data_coding,8}, {dest_addr_ton,1}, {dest_addr_npi,1}, {source_addr_npi,...}, {...}|...]}, [{uid,"5B07AFCB6EA1BE25CA1CCEC8925C6A1DA2AAB0F4"}]}}, {'$gen_call',{<0.10851.4>,#Ref<0.0.43.198041>}, {{submit_sm,[{short_message,[0,72,0,105,0|...]}, {destination_addr,"351913768304"}, {source_addr_ton,5}, {source_addr,"7337"}, {data_coding,8}, {dest_addr_ton,1}, {dest_addr_npi,...}, {...}|...]}, [{uid,"962DA498590F67AF868DDAC9F95A72E7D6640C17"}]}}, {'$gen_call',{<0.10852.4>,#Ref<0.0.43.198046>}, {{submit_sm,[{short_message,[0,72,0,105|...]}, {destination_addr,"351913768304"}, {source_addr_ton,5}, {source_addr,"7337"}, {data_coding,8}, {dest_addr_ton,...}, {...}|...]}, [{uid,"8E1E1D7F548C82D443C98FDB6E4FFD50205DBB9F"}]}}, {'$gen_call',{<0.10853.4>,#Ref<0.0.43.198051>}, {{submit_sm,[{short_message,[0,72,0|...]}, {destination_addr,"351913768304"}, {source_addr_ton,5}, {source_addr,[...]}, {data_coding,...}, {...}|...]}, [{uid,"39974B8D22FEC1158E9B34D669BF781CC5794205"}]}}, {'$gen_call',{<0.10854.4>,#Ref<0.0.43.198056>}, {{submit_sm,[{short_message,[0,72|...]}, {destination_addr,"351913768304"}, {source_addr_ton,5}, {source_addr,...}, {...}|...]}, [{uid,"8E8F476574C69BDBB2488F71E1876892BD320FB6"}]}}, {'$gen_call',{<0.10855.4>,#Ref<0.0.43.198061>}, {{submit_sm,[{short_message,[0|...]}, {destination_addr,[...]}, {source_addr_ton,...}, {...}|...]}, [{uid,"2A7DF716CED714200404E10324A90DCA183A49FD"}]}}, {'$gen_call',{<0.10856.4>,#Ref<0.0.43.198066>}, {{submit_sm,[{short_message,[...]}, {destination_addr,...}, {...}|...]}, [{uid,"4969C8F1E1D21BA7604D9393B3624A6B1BA7B649"}]}}, {'$gen_call',{<0.10857.4>,#Ref<0.0.43.198071>}, {{submit_sm,[{short_message,...},{...}|...]},[{uid,[...]}]}}, {'$gen_call',{<0.10858.4>,#Ref<0.0.43.198076>}, {{submit_sm,[{...}|...]},[{uid,...}]}}, {'$gen_call',{<0.10859.4>,#Ref<0.0.43.198081>}, {{submit_sm,[...]},[{...}]}}, {'$gen_call',{<0.10860.4>,#Ref<0.0.43.198086>}, {{submit_sm,...},[...]}}, {'$gen_call',{<0.10861.4>,#Ref<0.0.43.198091>},{{...},...}}, {'$gen_call',{<0.10862.4>,...},{...}}, {'$gen_call',{...},...}, {'$gen_call',...}, {...}|...]},