ncssar / radiolog

SAR radio log program
Other
13 stars 3 forks source link

fatal freeze at 1 second left in countdown of orphan subject located dialog #727

Open caver456 opened 4 months ago

caver456 commented 4 months ago

Reported at snow training 2/2/24. Need to get logs from R5.

caver456 commented 4 months ago

Finally got the transcript (Ftt2_3_24_2024_02_03_085248/radiolog_log_2024_02_03_085202.txt). The exact time of the reported freeze isn't known, but, the transcript does have a lot of tracebacks. Looks like there were at least two categories of tracebacks - not all of which necessarily led to the reported freeze.

1) deleteTeamTabAction doesn't properly delete 'Team 101BackUp'

102542:deleteTeamTabAction clicked
102542:deleting team tab: teamName=Team 101BackUp  extTeamName=z_Team101BackUp  niceTeamName=Team 101backUp
102542:  teamNameList before deletion:['dummy', 'Team 101', 'Team 102', 'Team 103', 'Team 104', 'Team 101BackUp']
102542:  extTeamNameList before deletion:['spacerLeft', 'z_Team00101', 'z_Team00102', 'z_Team00103', 'z_Team00104', 'z_Team101BackUp', 'spacer1']
102542:  i=5
Uncaught exception
Traceback (most recent call last):
  File "radiolog.py", line 5369, in tabContextMenu
  File "radiolog.py", line 5840, in deleteTeamTab
ValueError: list.remove(x): x not in list
Uncaught exception
Traceback (most recent call last):
  File "radiolog.py", line 3831, in updateTeamTimers
ValueError: 'z_Team101BackUp' is not in list
Uncaught exception
Traceback (most recent call last):
  File "radiolog.py", line 3831, in updateTeamTimers
ValueError: 'z_Team101BackUp' is not in list
...
...

(the repeating four-line traceback repeats every second for the rest of the session; not clear if it caused any other issues)

Note that the team name had multiple spaces when it was created 22 minutes prior:

094339:openChangeCallsignDialog: self.fleet=100 self.dev=3099
094339:changeCallsignDialog created.  fleet=100  dev=3099
094350:changeCallsignDialog accept called
094350:accept: FleetSync fleet=100  dev=3099
094350:Writing file C:\Users\NCSSAR\RadioLog\Ftt2_3_24_2024_02_03_085248\Ftt2_3_24_2024_02_03_085248_fleetsync.csv
094350:calling fsLogUpdate for fleetsync
094350:updating fsLog (fleetsync): fleet=100 dev=3099 callsign=Team 101 Back up  COM port=COM3
094350:New callsign pairing created from FleetSync: fleet=100  dev=3099  callsign=Team 101 Back up
094350:New callsign pairing created: fleet=100  dev=3099  uid=  callsign=Team 101 Back up
094354:Accepted
094354:newEntry called with these values:
094354:['0942', 'FROM', 'Team 101 Back up', 'k; RADIO CHECK: OK', '', '', 1706982177.608737, '100', '3099', '']
094354:new team: newTeamName=Team 101 Back up extTeamName=z_Team101BackUp niceTeamName=Team 101BackUp shortNiceTeamName=101BackUp
094354:extTeamNameList before sort:['spacerLeft', 'z_Team00101', 'spacer1', 'z_1sar119', 'z_1sar57', 'z_Kw-100-3110', 'z_Team101BackUp']
094354:grouped tabs:{'NCSOcmd': [], 'NCSO': [], 'NCPD': [], 'TPD': [], 'GVPD': [], 'CHP': [], 'Numbers': ['z_Team00101', 'z_Team101BackUp'], 'other': ['z_1sar119', 'z_1sar57', 'z_Kw-100-3110']}
094354:extTeamNameList after sort:['spacerLeft', 'z_Team00101', 'z_Team101BackUp', 'spacer1', 'z_1sar119', 'z_1sar57', 'z_Kw-100-3110']
094354:setting tab initial style in addTab: #tab_z_Team101BackUp { font-size:20px;background:none;padding-left:1px;padding-right:1px }
094354:next available hotkey:5

2) QLineEdit has been deleted - not sure what triggered this, but it also seems to repeat every second:/error:

100905:PARSING
100905:I110030991003099
100905: line:I110030991003099
100905:getCallsign called for FleetSync fleet=100 dev=3099
100905:  found matching entry/entries:[['100', '3099', 'Team 103']]
100905:FleetSync CID detected (not in $PKLSH): fleet=100  dev=3099  callsign=Team 103
100905:checking for existing open new entry tabs: callsign=Team 103 continueSec=20
100905:checking against existing widget: to_from=FROM team=Team 103 lastModAge:749
100905:checking against existing widget: to_from=FROM team=Team 101 lastModAge:42
100905:checking against existing widget: to_from=FROM team=Team 101 lastModAge:20
100905:updating fsLog (fleetsync): fleet=100 dev=3099 callsign=<None>  COM port=COM3
100905:no other new entry tab was found for this callsign; inside fsParse: calling openNewEntry
100905:openNewEntry called:key=fs callsign=Team 103 formattedLocString= fleet=100 dev=3099 origLocString= amendFlag=False amendRow=None isMostRecentForCallsign=False
100905:newEntryWidget __init__ called: formattedLocString= fleet=100 dev=3099 origLocString= amendFlag=False amendRow=None isMostRecentForCallsign=False
100905:newEntryWidget.addTab called: labelText=1009  widget=<__main__.newEntryWidget object at 0x000001C0777A9D80>
100905:inserting tab
Uncaught exception
Traceback (most recent call last):
  File "radiolog.py", line 3903, in updateTeamTimers
  File "radiolog.py", line 8650, in countdown
  File "radiolog.py", line 8075, in getValues
RuntimeError: wrapped C/C++ object of type QLineEdit has been deleted
Uncaught exception
Traceback (most recent call last):
  File "radiolog.py", line 3903, in updateTeamTimers
  File "radiolog.py", line 8650, in countdown
  File "radiolog.py", line 8075, in getValues
RuntimeError: wrapped C/C++ object of type QLineEdit has been deleted

seems to repeat every second, until this:

...
...
Uncaught exception
Traceback (most recent call last):
  File "radiolog.py", line 3903, in updateTeamTimers
  File "radiolog.py", line 8650, in countdown
  File "radiolog.py", line 8075, in getValues
RuntimeError: wrapped C/C++ object of type QLineEdit has been deleted
102542:deleteTeamTabAction clicked
102542:deleting team tab: teamName=Team 101BackUp  extTeamName=z_Team101BackUp  niceTeamName=Team 101backUp
102542:  teamNameList before deletion:['dummy', 'Team 101', 'Team 102', 'Team 103', 'Team 104', 'Team 101BackUp']
102542:  extTeamNameList before deletion:['spacerLeft', 'z_Team00101', 'z_Team00102', 'z_Team00103', 'z_Team00104', 'z_Team101BackUp', 'spacer1']
102542:  i=5
Uncaught exception
Traceback (most recent call last):
  File "radiolog.py", line 5369, in tabContextMenu
  File "radiolog.py", line 5840, in deleteTeamTab
ValueError: list.remove(x): x not in list
...

Note that this ending of the QLineEdit errors coincides with the begin of the z_Team101BackUp not in list errors. The list.remove(x): x not in list' message only happens once in the entire session.

caver456 commented 4 months ago

Creating a separate issue for the failure to delete team tab.

caver456 commented 4 months ago

Focusing on 'QLineEdit has been deleted' since the other issue has been moved to its own ticket #728:

Note that there is a very stale message stack at the time - a training issue for sure, but, it shouldn't lead to errors:

100905:checking for existing open new entry tabs: callsign=Team 103 continueSec=20
100905:checking against existing widget: to_from=FROM team=Team 103 lastModAge:749
100905:checking against existing widget: to_from=FROM team=Team 101 lastModAge:42
100905:checking against existing widget: to_from=FROM team=Team 101 lastModAge:20
100905:updating fsLog (fleetsync): fleet=100 dev=3099 callsign=<None>  COM port=COM3
100905:no other new entry tab was found for this callsign; inside fsParse: calling openNewEntry

Anyway - looking at the traceback, and adding the class names based on line number:

Uncaught exception
Traceback (most recent call last):
  File "radiolog.py", line 3903, in [MyWindow.]updateTeamTimers
  File "radiolog.py", line 8650, in [subjectLocatedDialog.]countdown
  File "radiolog.py", line 8075, in [newEntryWidget.]getValues
RuntimeError: wrapped C/C++ object of type QLineEdit has been deleted

In other words: the countdown within sublectLocatedDialog is getting called, but, its parent newEntryWidget has been deleted. So the question is: how did the newEntryWidget get deleted?

This line is in the transcript just beforehand:

100843: closing unused new entry widget for Team 101 due to inactivity

And there are several similar lines above - again, probably a training issue but looks like these were (mostly) handled gracefully. The full transcript from 100843:

100843:PARSING
100843:I010030661003066$PKLDS,180842,A,3920.0434,N,12017.4668,W,,,,,00,100,3066,80,00,*17

$PKLSH,3920.0434,N,12017.4668,W,180842,A,100,3066,*3F

$GPRMC,180842,A,3920.0434,N,12017.4668,W,,,,,*0A

$GPGGA,180842,3920.0434,N,12017.4668,W,1,,,,,,,,*4B


100843: line:I010030661003066$PKLDS,180842,A,3920.0434,N,12017.4668,W,,,,,00,100,3066,80,00,*17

100843:getCallsign called for FleetSync fleet=100 dev=3066
100843:  found matching entry/entries:[['100', '3066', 'Team 101']]
100843:FleetSync CID detected (not in $PKLSH): fleet=100  dev=3066  callsign=Team 101
100843: line:$PKLSH,3920.0434,N,12017.4668,W,180842,A,100,3066,*3F

100843:getCallsign called for FleetSync fleet=100 dev=3066
100843:  found matching entry/entries:[['100', '3066', 'Team 101']]
100843:$PKLSH (FleetSync) detected containing CID: fleet=100  dev=3066  -->  callsign=Team 101
100843:Valid location string:'3920.0434|N|12017.4668|W'
100843:convertCoords called: targetDatum=WGS84 targetFormat=UTM 5x5 coords=['3920.0434', 'N', '12017.4668', 'W']
100843:Formatted location string:'33478  57348'
100843:convertCoords called: targetDatum=WGS84 targetFormat=D.dList coords=['3920.0434', 'N', '12017.4668', 'W']
100843:WGS84 lat=39.33405666666667  lon=-120.29111333333333
100843: line:$GPRMC,180842,A,3920.0434,N,12017.4668,W,,,,,*0A

100843: line:$GPGGA,180842,3920.0434,N,12017.4668,W,1,,,,,,,,*4B

100843: line:
100843:checking for existing open new entry tabs: callsign=Team 101 continueSec=20
100843:checking against existing widget: to_from=FROM team=Team 103 lastModAge:727
100843:checking against existing widget: to_from=FROM team=Team 101 lastModAge:61
100843:checking against existing widget: to_from=FROM team=Team 101 lastModAge:20
100843:updating fsLog (fleetsync): fleet=100 dev=3066 callsign=<None>  COM port=COM3
100843:no other new entry tab was found for this callsign; inside fsParse: calling openNewEntry
100843:openNewEntry called:key=fs callsign=Team 101 formattedLocString=33478  57348 fleet=100 dev=3066 origLocString=3920.0434|N|12017.4668|W amendFlag=False amendRow=None isMostRecentForCallsign=False
100843:newEntryWidget __init__ called: formattedLocString=33478  57348 fleet=100 dev=3066 origLocString=3920.0434|N|12017.4668|W amendFlag=False amendRow=None isMostRecentForCallsign=False
100843:newEntryWidget.addTab called: labelText=1008  widget=<__main__.newEntryWidget object at 0x000001C0777AA170>
100843:inserting tab
100843:calling processEvents before sending GET request...
100843:  closing unused new entry widget for Team 101 due to inactivity
100843:calling processEvents before sending GET request...
100843:Sending GET request:
100843:http://192.168.11.20:8080/rest/location/update/position?lat=39.33405666666667&lng=-120.29111333333333&id=FLEET:100-Team 101Team 101
100843:  request sent
100843:  response: <Response [200]>
100843:Sending GET request:
100843:
100843:  exception during sending of GET request: Invalid URL '': No scheme supplied. Perhaps you meant https://?

The last line - exception during GET - is interesting - wonder if it's related?

On five of the 'closing unused new entry widget' occurrences, there is a note of closing the related CCD - should there be something similar for subject located or for clue? Actually, if one of those is open, the NEW shouldn't be deleted at all... right?

093129:  closing unused new entry widget for KW-100-3035 due to inactivity
093129:    closing related Change Callsign Dialog
...
094338:  closing unused new entry widget for KW-100-3116 due to inactivity
094338:    closing related Change Callsign Dialog
...
094659:  closing unused new entry widget for KW-100-3072 due to inactivity
094659:    closing related Change Callsign Dialog
...
120347:  closing unused new entry widget for KW-100-3089 due to inactivity
120347:    closing related Change Callsign Dialog
...
132400:  closing unused new entry widget for ca due to inactivity
132400:    closing related Change Callsign Dialog

The first occurrence of 'looks like a clue' was not until hours later, so the subect located dialog must have been raised another means:

125635:"Looks like a clue" popup shown; message so far: "blue beanie clue"

Looks like there were two subject located dialogs opened for Team 101 at the same time, and one of them was closed:

094947:['0949', 'FROM', 'Team 101', "RADIO LOG SOFTWARE: 'SUBJECT LOCATED' form opened; radio operator is gathering details", '33478  57339', '', 1706982559.90604, '100', '3066', '3920.0385|N|12017.4665|W']
...
095103:['0949', 'FROM', 'Team 101', "RADIO LOG SOFTWARE: radio operator is still collecting data for the 'SUBJECT LOCATED' form; values so far: RADIO GPS: 33478  57339; CONDITION: Stand by", '33478  57339', '', 1706982559.90604, '100', '3066', '3920.0386|N|12017.4665|W']
...
095321:['0949', 'FROM', 'Team 101', "RADIO LOG SOFTWARE: radio operator is still collecting data for the 'SUBJECT LOCATED' form; values so far: RADIO GPS: 33478  57339; LOCATION: 33482 57356; CONDITION: Stand by", '33478  57339', '', 1706982559.90604, '100', '3066', '3920.0386|N|12017.4665|W']
...
095413:['0949', 'FROM', 'Team 101', "RADIO LOG SOFTWARE: 'SUBJECT LOCATED' form opened; radio operator is gathering details", '33478  57339', '', 1706982559.90604, '100', '3066', '3920.0386|N|12017.4665|W']
...
095419:['0949', 'FROM', 'Team 101', "RADIO LOG SOFTWARE: radio operator canceled the 'SUBJECT LOCATED' form: TIME: 0949; RADIO GPS: 33478  57339", '33478  57339', '', 1706982559.90604, '100', '3066', '3920.0386|N|12017.4665|W']
...
095446:['0954', 'FROM', 'Team 103', "RADIO LOG SOFTWARE: 'SUBJECT LOCATED' form opened; radio operator is gathering details", '33384  57387', '', 1706982879.6814585, '100', '3099', '3920.0661|N|12017.5310|W']
...
095648:['0954', 'FROM', 'Team 103', "RADIO LOG SOFTWARE: radio operator is still collecting data for the 'SUBJECT LOCATED' form; values so far: RADIO GPS: 33384  57387; LOCATION: 33381 57383", '33384  57387', '', 1706982879.6814585, '100', '3099', '3920.0661|N|12017.5310|W']
...

the next mention of 'subject located' was 101154, after the errors began. This might be something to try: two subject located dialogs open at the same time for the same callsign, then canceling one of them.

caver456 commented 4 months ago

Duplicated the hang at 1 second left! The countdown in question is in an orphan subject located dialog. When it gets to 1, the spinner shows up and radiolog freezes. How did the SLD become an orphan? Its parent NEW was deleted by clicking cancel in the NEW - which shouldn't be possible, but if there are two SLDs open for the same NEW, then apparently the NEW can be canceled after one of the SLDs is canceled.

If all of the NEW's spawned SLDs are open when you try to cancel the NEW, then this dialog shows as expected:

image

But if you open two SLDs from the same NEW, then cancel one of the SLDs, you can then cancel the NEW and the orphan SLD remains. Putting text in any field of that orphan SLD starts the countdown in question.

Side note, should also be fixed: if you cancel both SLDs, these errors start showing, once per second:

Traceback (most recent call last):
  File "C:\Users\caver\Documents\GitHub\radiolog\radiolog.py", line 3903, in updateTeamTimers
    sld.countdown()
  File "C:\Users\caver\Documents\GitHub\radiolog\radiolog.py", line 8658, in countdown
    self.ui.countdownLabel.setText(self.countdownText)
RuntimeError: wrapped C/C++ object of type QLabel has been deleted

Reconfirmed that the countdown does work as expected when the SLD's parent is open. This is true for each open SLD, i.e. for multiple simultaneous SLDs owned by the same NEW. This confirms that the orphan is the problem.

caver456 commented 4 months ago

Now that the cause is isolated, seems like this could be changed to low priority since it's pretty obscure, and definitely a training issue. But, best to leave it as medium priority since it actually causes a fatal freeze.

caver456 commented 4 months ago

The report that the freeze was in the LLAC dialog rather than the subject located dialog is probably incorrect, though, it might be good to try to duplicate an LLAC hang with the similar orphan issue if possible.

Actually the LLAC wording was my fault: there is no countdown in LLAC - the only dialog with a coundown is the subject located dialog, which is the one that I sent a screenshot of to the reporter, and she confirmed that was the one. So - no need to check anything else - the orphan SLD is the problem.

caver456 commented 4 months ago

Proposed fixes:

caver456 commented 4 months ago

Trying again the next day: I can't duplicate the freeze and spinner and countdown 1, but, I can duplicate the begin of QLineEdit tracebacks at the same time as the countdown expiration in the orphan SLD. Interesting.. but, the solution should take care of both symptoms.