sergei-mironov / asterisk-dongle-setup

Playground project aimed at setting up Asterisk server and the GSM stack on Nix.
28 stars 4 forks source link

dongleman_daemon.py stuck #9

Open mo3r31337 opened 1 year ago

mo3r31337 commented 1 year ago

Hello, I faced with problem that after around in 12hours of idling the dongleman_daemon is hangs and didn't anything until restart daemon. Is it just me or someone else having this problem? Thank for any advice to help fix this

sergei-mironov commented 1 year ago

Hi, dongleman_daemon runs two async tasks at https://github.com/grwlf/asterisk-dongle-setup/blob/master/python/dongleman_daemon.py#L174 and waits for events. In case no events come, it is expected to do nothing. Are you sure that your case is not a desired behavior?

mo3r31337 commented 1 year ago

No, I don't think this is what is expected. Because I check using ussd with the code from asterisk and at such moments the daemon does not redirect the output to telegram, but if the daemon is restarted, the message will come instantly. I suspect that if there is some activity from time to time, then there will be no such problem, I need to experiment with crontab, call a ussd request every few hours.

sergei-mironov commented 1 year ago

The code doesn't have debug facilities so you probably want to add appropriate prints and try to debug the problem.

The path you might need to check looks like follows:

  1. Asterisk receives an SMS and launches a hangup handler here. Does it log the message you are missing?
  2. Dongleman_send.py puts the message in queue and triggers the filesystem notify event. I need to say that this donleman_spool library is prone to errors since I wrote it ad-hoc. Can you see the file corresponding to the message?
  3. dongleman_daemon.py listens for inotify events here https://github.com/grwlf/asterisk-dongle-setup/blob/63cfdd99da8ebef97aa9157413e140c0563a6506/python/dongleman_daemon.py#L77 it should notice the presence of new file in the pool and process it. What does it do in reality?

One moment is bothering me: you are talking about USSD messages. I din't test anything besides SMS and voicecalls, not sure what asterisk does upon receiving USSD.

mo3r31337 commented 1 year ago

Let me try to explain what it looks like. I use this setup on an arm microcomputer with one huawei e1550 modem to serve only one of my mobile numbers while I'm in roaming. Since I don't get many calls and sms, it looks like the telethone library session is being dropped by the telegram servers due to inactivity. At this time, in the spool directory of dongleman, I see the json files of the queue, they are successfully created, but they are not sent to the telegram account. But, if I restart dongleman_daemon.py, then it will immediately send the entire queue to the telegram account. I'm using ussd because it's a free way to test telegram forwarding functionality. For testing, I created a timer in systemd that calls this asterisk -x "dongle ussd dongle0 *100#" command every two hours. I have been testing the last two days with this timer and there is no problem with losing the telegram session. That is, it is necessary to create an activity so that the telegram session does not freeze. This is how I see the problem

mo3r31337 commented 1 year ago

For USSD I've add this to extensions.conf file, right after sms section.

exten => ussd,1,Verbose(USSD-IN ${CALLERID(num)} ${USSD_BASE64})
same => n,Set(MSG=--message-base64=${USSD_BASE64})
same => n,Hangup()
mo3r31337 commented 1 year ago

@grwlf It seems you are right. You pointed me in the right direction, after some time dongleman_daemon.py stops responding to the creation of new files in the /tmp/dongleman/spool/queue directory, while the connection to telegram servers is established and dongleman answers to the voice calls if I call him via telegram. I can also make an outgoing call from telegram via asterisk and chan_dongle

sergei-mironov commented 1 year ago

Interesting. I've reviewed the code and want to say that of cause the listen_system_commands handler almost certainly has problems:

  1. Files from /queue are removed only if the control flow returns to spool_iterate without exceptions.
  2. If processing of some file lead to exception, then
    • It is not removed
    • Other files will not be processed
    • The inotify event is not repeated

As a consequence, a single problematic file may cause the daemon to stall. Could you please try the latest commit and/or monitor the logs? Exception text should appear in console due to this print

mo3r31337 commented 1 year ago

Ok, I've update the script.

<WS (connecting as dongleman-ari-app)
WS> Connected!
Event(wd=1, mask=<Mask.MOVED_TO: 128>, cookie=5183, name=PosixPath('00000000.json'))
Processing path /tmp/dongleman/spool/queue/00000000.json
Event(wd=1, mask=<Mask.DELETE: 512>, cookie=0, name=PosixPath('00000000.json'))

Left the demon to work. But I think the problem is not in the wrong file, the last few days I have been checking if the daemon is working by copying a known correct json file into the queue directory. And when the daemon is stuck, it just does nothing when a new file appears, even inotify does not report that the file was created on the file system.

mo3r31337 commented 1 year ago

It is strange, but with fix from this e055c46 commit the dongleman_daemon running fine for the last two days

sergei-mironov commented 1 year ago

It is strange, but with fix from this e055c46 commit the dongleman_daemon running fine for the last two days

I realized that the reason could be simpler - before the commit the daemon may have raised some unhandled exception leading to its termination. By the commit I now catch all the exceptions so now the daemon should print an error but continue to work.

I would be glad if you share some logs to help me figure out what exceptions do you have from it.

mo3r31337 commented 1 year ago

Hello, now I have only these errors and daemon stuck again

Processing path /tmp/dongleman/spool/queue/00000000.json                         
Event(wd=1, mask=<Mask.DELETE: 512>, cookie=0, name=PosixPath('00000000.json'))  
Event(wd=1, mask=<Mask.MOVED_TO: 128>, cookie=3205, name=PosixPath('00000000.json
'))                                                                              
Processing path /tmp/dongleman/spool/queue/00000000.json                         
Event(wd=1, mask=<Mask.DELETE: 512>, cookie=0, name=PosixPath('00000000.json'))  
Attempt 1 at connecting failed: TimeoutError:                                    
Attempt 2 at connecting failed: TimeoutError:                                    
Attempt 3 at connecting failed: TimeoutError:                                    
Attempt 4 at connecting failed: TimeoutError:                                    
Attempt 5 at connecting failed: TimeoutError:                                    
Attempt 6 at connecting failed: TimeoutError:                                    
Attempt 1 at connecting failed: TimeoutError:                                    
Attempt 2 at connecting failed: TimeoutError:                                    
Attempt 3 at connecting failed: TimeoutError:                                    
Attempt 4 at connecting failed: TimeoutError:                                    
Attempt 5 at connecting failed: TimeoutError:                                    
Attempt 6 at connecting failed: TimeoutError:                                    
Attempt 1 at connecting failed: TimeoutError:                                    
Attempt 2 at connecting failed: TimeoutError:                                    
Attempt 3 at connecting failed: TimeoutError:                                    
Attempt 4 at connecting failed: TimeoutError:                                    
Attempt 5 at connecting failed: TimeoutError:                                    
Attempt 6 at connecting failed: TimeoutError: 

After losing the internet connection, the daemon got stuck again. It also does not react to the creation of new files in the queue directory.

mo3r31337 commented 1 year ago

Attempt 5 at connecting failed: TimeoutError: 
Attempt 6 at connecting failed: TimeoutError:
Attempt 1 at connecting failed: TimeoutError: 
Attempt 2 at connecting failed: TimeoutError: 
Attempt 3 at connecting failed: TimeoutError:
Attempt 4 at connecting failed: TimeoutError: 
Attempt 5 at connecting failed: TimeoutError:
Attempt 6 at connecting failed: TimeoutError: 
Attempt 1 at connecting failed: TimeoutError: 
Attempt 2 at connecting failed: TimeoutError: 
Attempt 3 at connecting failed: TimeoutError: 
Attempt 4 at connecting failed: TimeoutError: 
Attempt 5 at connecting failed: TimeoutError: 
Attempt 6 at connecting failed: TimeoutError: 
Automatic reconnection failed 5 time(s)
Future exception was never retrieved
future: <Future finished exception=ConnectionError('Connection to Telegram failed 5 time(s)')>
ConnectionError: Connection to Telegram failed 5 time(s)
Event(wd=1, mask=<Mask.MOVED_TO: 128>, cookie=9157, name=PosixPath('00000000.json'))
Processing path /tmp/dongleman/spool/queue/00000000.json
Exception while processing JSON '/tmp/dongleman/spool/queue/00000000.json':
Cannot send requests while disconnected
Event(wd=1, mask=<Mask.DELETE: 512>, cookie=0, name=PosixPath('00000000.json'))```
sergei-mironov commented 1 year ago

OK, that looks like in your case Telethon looses the connection and gives up re-establishing it..

mo3r31337 commented 1 year ago

I have added some parameters to TelegramClient. Now I don't get connection loss errors, but once after a few days the script stopped without any errors in the console.

  tclient=TelegramClient(session=SESSION,
                         api_id=TELEGRAM_API_ID,
                         api_hash=TELEGRAM_API_HASH,
                         connection_retries=-1,
                         retry_delay=2,
                         auto_reconnect=True)
sergei-mironov commented 1 year ago
  tclient=TelegramClient(session=SESSION,
                         api_id=TELEGRAM_API_ID,
                         api_hash=TELEGRAM_API_HASH,
                         connection_retries=-1,
                         retry_delay=2,
                         auto_reconnect=True)

Makes sense! I'll add this to the code, thanks.

but once after a few days the script stopped without any errors in the console.

Could it be a segfault from some of the C/C++ libraries involved? Could you please check your system's segfault log? Also one could try to call the script with strace to get very verbose logs of system calls..

sergei-mironov commented 1 year ago

I use this setup on an arm microcomputer with one huawei e1550 modem to serve only one of my mobile numbers while I'm in roaming

@mo3r31337 , could you please share some information on your ARM setup? Do you use RaspberyPi for this? What kind of Nixpkgs/NixOS do you use? I would like to build this project on a smaller device than I use now.

mo3r31337 commented 1 year ago

@grwlf Of course I share info about my setup. The RaspberryPi too huge for this. I use rock pi s with 512Mb of ram and 8Gb nand storage. The board have RK3308s cpu with 4 cores. I've install Debian 11 (bullseye) and I didn't use nixpkgs, I didn't like stuff like this package manager. I've crosscompile all needed software and build deb's packages. If you need it, I can share pre-builded deb packages with dsc files Actually I use it with PoE hat and everything powered via ethernet patchcord from my mikrotik router. photo_2022-12-12_13-55-26 photo_2022-12-12_13-55-14 photo_2022-12-12_13-55-21 https://user-images.githubusercontent.com/24485702/206980820-f99846b6-d258-4d3b-b273-295ca0c57bab.mp4

sergei-mironov commented 1 year ago

Actually I use it with PoE hat and everything powered via ethernet patchcord from my mikrotik rou

PoE is amazing, I think I should try this setup! But am I understand it correctly that you still use nixpkgs on Host for cross-compilation? I mean I think that it should be possible to use Nix for building the .deb package which installs all the dependencies of this project as /nix/store/... files on the Debian system. Do you use this approach?

mo3r31337 commented 1 year ago

No, I haven't completely used the nixpkg package system to build .deb packages. I used Nixpkg only to generate a json file for dongleman_daemon because I did not quite understand which variable is responsible for what. I tried building with nix on the host machine, but it takes a lot of time and space. Moreover, it is hardly possible to build this setup for arm architecture using nixpkg, I mean that there are some nuances in this process, not all packages were built in a standard way. In total, I spent about two weeks building all the packages, maybe even more.

sergei-mironov commented 1 year ago

Thank you, now I understand. It is not an easy job you are doing, good luck with that! I think that building for arm using nixpgks might be possible. I was able to cross-compile some of the system parts of Pinephone 64, but not the GUI user space. AFAIK compiling mobile GUIs currently requires setting up a virtual machine or a real device, which is indeed a troublesome task. I made some notes about it here https://github.com/grwlf/mobile-nixos-cfg

Regarding the Asterisk, I still have plans to run it under Nix on some ARM device, but currently I do not want to use such a tiny computer like yours. I think I will buy a regular Raspberry, but I'd like to test PoE which some of them also support. New modems should arrive to me in January, so I hope to build an arm setup for this project after this time.