ham-radio-software / D-Rats

D-Rats program for D-Star Ham Radios
https://iz2lxi.jimdofree.com/
Other
43 stars 13 forks source link

msglock LOCK OWNED BY traceback #122

Open wb8tyw opened 2 years ago

wb8tyw commented 2 years ago

Have a report of a message lock traceback showing up.

I have seen that show up with undeliverable messages that d-rats tried to deliver or a d-rats crash tripped during delivery.

One possible way to reproduce is to try to send a form to yourself. D-rats has a code that drops outgoing packets that it detects are destined for itself. In the case of sending forms or messages, d-rats appears to time out and retry sending the message forever, until it is shut down and the message and sometimes its associated lock file are removed.

02/18/2022 20:36:14:INFO:MsgRouting:msg_lock: ------ LOCK OWNED BY -------
  File "/usr/lib/python3.7/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/riggs498/Ham-Radio/D-Rats/d_rats/msgrouting.py", line 660, in
_run
    queue = self._get_queue()
  File "/home/riggs498/Ham-Radio/D-Rats/d_rats/msgrouting.py", line 392, in
_get_queue
    if not msg_lock(field):
  File "/home/riggs498/Ham-Radio/D-Rats/d_rats/msgrouting.py", line 103, in
msg_lock
    traceback.print_stack(file=lock)
------------

02/18/2022 20:36:14:INFO:MessageRouter:_get_queue: Message
/home/riggs498/.d-rats-ev/messages/Outbox/form_02132022_233920.xml is
locked, skipping
wb8tyw commented 2 years ago

Not sure what I did to reproduce it, but I reproduced it while attempting to debug the emailgw.

Examining the lock file shows that it contains the stackdump text that was logged to the console, instead of some name that is supposed to be owning the lock.

wb8tyw commented 2 years ago

Looking at the code, it seems that storing the traceback text is intentional.

Now need to at least figure out how to deal with a stale lock file. Ideally there should be some way to detect this.

wb8tyw commented 2 years ago

Need a new ticket to re-write the lock functions into their own module to make the code cleaner, and maybe also put some human understandable text in the log file instead of a stack trace that makes it look like d-rats through an exception.

Found one cause of the message already locked dump. If you have a message in your outbox, and do not have any stations in your static routes or in your recently heard list, then you will get that message periodically in the log until d-rats hears a station.

It does not matter if the message is going to be sent via a gateway and not to a heard station. Until you hear a station, d-rats will keep writing out that diagnostic instead of trying to send the message.