Threading causes crash - Githubissues

HolQue commented 2 months ago

Hi Cuong,

I use the following code to test the threading:

THREAD    TEST_THREAD_1     False
   Log    test thread 1 started
   FOR    ${index}    IN RANGE    0    10
      Log     I am the test thread 1
      Wait 1s
   END
   Wait 1s
   Log     end of test thread 1 reached
END

THREAD    TEST_THREAD_2     False
   Log    test thread 2 started
   FOR    ${index}    IN RANGE    0    10
      Log     I am the test thread 2
      Wait 1s
   END
   Wait 1s
   Log     end of test thread 2 reached
END

The code contains an error:

robot.errors.KeywordError: No keyword with name 'Wait 1s' found.

This error together with the fact that the line causing this error, is running in a thread, causes curious effects.

Log:

Exception in thread TEST_THREAD_1:
robot.errors.KeywordError: No keyword with name 'Wait 1s' found.

During handling of the above exception, another exception occurred:

AttributeError: 'NoneType' object has no attribute 'status'

During handling of the above exception, another exception occurred:

AttributeError: 'NoneType' object has no attribute 'status'

During handling of the above exception, another exception occurred:

Output:  D:\ROBFW\TestTestcases\RobotFramework\threading\logfiles\test.threading.xml
AttributeError: 'NoneType' object has no attribute 'status'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\RobotFramework\python39\lib\threading.py", line 980, in _bootstrap_inner
Exception in thread TEST_THREAD_2:
robot.errors.KeywordError: No keyword with name 'Wait 1s' found.

During handling of the above exception, another exception occurred:

AttributeError: 'NoneType' object has no attribute 'status'

During handling of the above exception, another exception occurred:

    self.run()
  File "D:\RobotFramework\python39\lib\threading.py", line 917, in run
AttributeError: 'NoneType' object has no attribute 'status'

During handling of the above exception, another exception occurred:

    self._target(*self._args, **self._kwargs)
  File "D:\RobotFramework\python39\lib\site-packages\robotframework-6.1-py3.9.egg\robot\running\bodyrunner.py", line 548, in run_worker
AttributeError: 'NoneType' object has no attribute 'status'

During handling of the above exception, another exception occurred:

    self._context.variables.end_thread()
  File "D:\RobotFramework\python39\lib\site-packages\robotframework-6.1-py3.9.egg\robot\running\statusreporter.py", line 67, in __exit__
Log:     D:\ROBFW\TestTestcases\RobotFramework\threading\logfiles\test.threading_log.html
Traceback (most recent call last):
  File "D:\RobotFramework\python39\lib\threading.py", line 980, in _bootstrap_inner
    raise failure
robot.errors.HandlerExecutionFailed: AttributeError: 'NoneType' object has no attribute 'status'
    self.run()
  File "D:\RobotFramework\python39\lib\threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "D:\RobotFramework\python39\lib\site-packages\robotframework-6.1-py3.9.egg\robot\running\bodyrunner.py", line 548, in run_worker
    self._context.variables.end_thread()
  File "D:\RobotFramework\python39\lib\site-packages\robotframework-6.1-py3.9.egg\robot\running\statusreporter.py", line 67, in __exit__
    raise failure
robot.errors.HandlerExecutionFailed: AttributeError: 'NoneType' object has no attribute 'status'

When I repeat the test several times, I get different test results:

(1)

WARN ]  An exception occurred in 'TEST_THREAD_1' thread. Exception: No keyword with name 'Wait 1s' found.'
Test.Threading :: Simple test suite                                   Test.Threading :: Simple test suite                                   | PASS |
1 test, 1 passed, 0 failed, 0 unknown

(2)

Threading Test                                                        | PASS |
------------------------------------------------------------------------------
Test.Threading :: Simple test suite                                   | UNKNOWN |
1 test, 0 passed, 0 failed, 1 unknown

(3)

Threading Test                                                        | PASS |
------------------------------------------------------------------------------
[ Test.Threading :: Simple test suite                                   [ WARN| WARNUNKNOWN ]  An exception occurred in 'TEST_THREAD_1' thread. Exception: No keyword with name 'Wait 1s' found.'
 |
 ]  An exception occurred in 'TEST_THREAD_2' thread. Exception: No keyword with name 'Wait 1s' found.'
1 test, 0 passed, 0 failed, 1 unknown

(4)

Threading Test                                                        | PASS |
------------------------------------------------------------------------------
Test.Threading :: Simple test suite                                   | [ UNKNOWNWARN |
[ 1 test, 0 passed, 0 failed, 1 unknown
 ]  An exception occurred in 'TEST_THREAD_2' thread. Exception: No keyword with name 'Wait 1s' found.'
==============================================================================
WARN ]  An exception occurred in 'TEST_THREAD_1' thread. Exception: No keyword with name 'Wait 1s' found.'

(5)

Threading Test                                                        | PASS |
------------------------------------------------------------------------------
Test.Threading :: Simple test suite                                   [ [ | WARNUNKNOWNWARN |
 ]  An exception occurred in 'TEST_THREAD_1' thread. Exception: No keyword with name 'Wait 1s' found.'
1 test, 0 passed, 0 failed, 1 unknown
 ]  An exception occurred in 'TEST_THREAD_2' thread. Exception: No keyword with name 'Wait 1s' found.'

This output really is a confusion.

The result counting deviates. First result is

1 test, 1 passed, 0 failed, 0 unknown

All other results are

1 test, 0 passed, 0 failed, 1 unknown

This looks like a synchronization is missing internally.

The log file looks better:

INFO - +-- START THREAD: TEST_THREAD_1 [ ]
INFO - +--- START KEYWORD: BuiltIn.Log [ test thread 1 started ]
INFO - +---- START THREAD: TEST_THREAD_2 [ ]
INFO - +----- START KEYWORD: BuiltIn.Log [ test thread 2 started ]
INFO - +----- END KEYWORD: BuiltIn.Log (1)
INFO - +---- END KEYWORD: BuiltIn.Log (1)
INFO - +---- START FOR: ${index} IN RANGE [ 0 | 10 ] [ ]
INFO - +---- START FOR: ${index} IN RANGE [ 0 | 10 ] [ ]
INFO - +----- START ITERATION: ${index} = 0 [ ]
INFO - +------ START ITERATION: ${index} = 0 [ ]
INFO - +------- START KEYWORD: BuiltIn.Log [ I am the test thread 1 ]
INFO - +-------- START KEYWORD: BuiltIn.Log [ I am the test thread 2 ]
INFO - +-------- END KEYWORD: BuiltIn.Log (1)
INFO - +-------- END KEYWORD: BuiltIn.Log (1)
INFO - +-------- START KEYWORD: Wait 1s [ ]
INFO - +-------- START KEYWORD: Wait 1s [ ]
UNKNOWN - No keyword with name 'Wait 1s' found.
UNKNOWN - No keyword with name 'Wait 1s' found.
INFO - +--------- END KEYWORD: Wait 1s (2)
INFO - +-------- END KEYWORD: Wait 1s (1)
INFO - +------- END ITERATION: ${index} = 0 (6)
INFO - +------ END ITERATION: ${index} = 0 (5)
INFO - +----- END FOR: ${index} IN RANGE [ 0 | 10 ] (6)
------------------------------------------------------
INFO - +--- END TEST: Threading Test (3)
------------------------------------------------------
INFO - +---- END FOR: ${index} IN RANGE [ 0 | 10 ] (6)
INFO - +---- START KEYWORD: Wait 1s [ ]
INFO - +--- START KEYWORD: Wait 1s [ ]
INFO - +--- END KEYWORD: Wait 1s (2)
INFO - +--- END KEYWORD: Wait 1s (0)
INFO - +--- START KEYWORD: BuiltIn.Log [ end of test thread 1 reached ]
INFO - +--- START KEYWORD: BuiltIn.Log [ end of test thread 2 reached ]
INFO - +--- END KEYWORD: BuiltIn.Log (0)
INFO - +--- END KEYWORD: BuiltIn.Log (0)
WARN -  An exception occurred in 'TEST_THREAD_1' thread. Exception: No keyword with name 'Wait 1s' found.'
WARN -  An exception occurred in 'TEST_THREAD_2' thread. Exception: No keyword with name 'Wait 1s' found.'

But it can be seen in the middle, that the UNKNOWN has no further effect on the test execution. The execution continues. But I would expect an abort.

At the end the same error (No keyword ... found) is labelled as WARN (in opposite to the previous label UNKNOWN). But an exception I would not label as WARN.

How to handle these things?

HolQue commented 2 months ago

Good news: log output of the test above executed with log level USER:

UNKNOWN - No keyword with name 'Wait 1s' found.
WARN -  An exception occurred in 'TEST_THREAD_1' thread. Exception: No keyword with name 'Wait 1s' found.'
UNKNOWN - No keyword with name 'Wait 1s' found.
WARN -  An exception occurred in 'TEST_THREAD_2' thread. Exception: No keyword with name 'Wait 1s' found.'

No INFO any more.

This works!

milanac030988 commented 2 months ago

Hello Holger,

Thank you for testing the threading feature and uncovering these issues. I have a few points to address:

'Wait 1s' is not a keyword in Robot Framework. It seems you may have missed adding the extra space after 'wait'. Even 'wait' alone is not a keyword; I believe you meant the keyword 'Sleep'.
Regarding the decision on whether to abort the main thread upon encountering an exception in the forked thread:

As per my perspective, in the context of a testing environment, when a forked thread is designed to perform a supportive or non-critical task, an exception within that thread should not necessarily lead to the termination of the main test.
The main test flow should have the primary responsibility for determining the pass/fail status of the test. This approach allows for better control over the testing process and can provide more accurate and comprehensive test results. It also supports a more robust and fault-tolerant testing environment where non-critical failures in ancillary processes don't unduly affect the overall test outcomes.

The inconsistency in results stems from the main thread sometimes finishing before and sometimes after the forked thread. This underscores the necessity for synchronization keywords to ensure thread coordination.

Thanks, Cuong

test-fullautomation commented 2 months ago

Hi Holger,

testing threading is not a simple task :-)

I agree to Cuong's answer.

a error in a thread can not directly cause a stop of the main thread.
the thread user is responsible for coordination of the threads. For this we provide the notification mechanism.
main thread is responsible to report the overall result. threads can end with any result, but if nice implemented (this should be a rule) they should also end with passed.

Thank you, Thomas

HolQue commented 2 months ago

Hi Cuong,

yes, I also agree! Thank you very much for your detailed answer. I learned a lot ;-)

HolQue commented 2 months ago

Nevertheless, there is one aspect left where I have doubts. The test setup is the same like before: I have a Windows console opened and execute the test repeatedly with the help of a batch file.

At first I reduce the test code as much as possible:

THREAD    TEST_THREAD_1     False
   Log     I am TEST_THREAD_1
END

That's all. And runs stable.

I put the log keyword into a loop:

THREAD    TEST_THREAD_1     False
   FOR    ${index}    IN RANGE    0    5
      Log     I am TEST_THREAD_1
   END
END

Also this code I execute repeatedly. In some rare cases this is successful. But in most of the cases an exception occurs:

Exception in thread TEST_THREAD_1:
AttributeError: 'NoneType' object has no attribute 'status'

During handling of the above exception, another exception occurred:

AttributeError: 'NoneType' object has no attribute 'status'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\RobotFramework\python39\lib\threading.py", line 980, in _bootstrap_inner
Output:  D:\ROBFW\TestTestcases\RobotFramework\threading\logfiles\test.threading.xml
    self.run()
  File "D:\RobotFramework\python39\lib\threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "D:\RobotFramework\python39\lib\site-packages\robotframework-6.1-py3.9.egg\robot\running\bodyrunner.py", line 548, in run_worker
    self._context.variables.end_thread()
  File "D:\RobotFramework\python39\lib\site-packages\robotframework-6.1-py3.9.egg\robot\running\statusreporter.py", line 67, in __exit__
    raise failure
robot.errors.HandlerExecutionFailed: AttributeError: 'NoneType' object has no attribute 'status'

In all cases the main error message is the same:

AttributeError: 'NoneType' object has no attribute 'status''

I try an IF condition instead of a FOR loop:

THREAD    TEST_THREAD_1     False
   IF    ${1} == ${1}
       Log     I am TEST_THREAD_1
   END
END

Now it's other way round: In most of the cases this runs stable. In some rare cases (~ 1 from 5 tries) I get the exception.

I do not understand why this happens. Is there anything critical in my robot code?

Or is it maybe possible to handle this exception by checking the object that causes: 'NoneType' object has no attribute 'status'? To let all end with a crash is not nice.

test-fullautomation commented 2 months ago

Hi Holger,

did you pause the main thread?

Must be after last "END".

Thank you,l Thomas

HolQue commented 2 months ago

Hi Thomas,

very interesting. I added a Sleep 6 at end of test - to make sure that the thread ends before the main thread ends. Now it runs stable. No exception any more.

OK, what does this mean? The test developer is responsible for making sure that all threads have been finished before the main thread comes to his end? Is that right?

In this case I would recommend to mention this as a strict design rule in the documentation.

test-fullautomation commented 2 months ago

Hi Holger, in principle it's like that: The user of a thread must know what he/she is doing. It's not as beginner's feature...

@milanac030988 : Nevertheless: TML has no problem when the main thread ends, but other threads are still running- they can even still log to console. It would be nice if a crash can be avoided.

Regards, Thomas

HolQue commented 2 months ago

But nevertheless, I still do not understand why the discussed situation must cause a crash. Is there really no other was to handle this?

The root cause can be found here:

file: statusreporter.py method: __exit__ line of code: context.test.status = result.status

The crash I mentioned above happens because context.test is None. Therefore no status is accessible.

My understanding now is that this happens when a test ends before a thread ends. And this means, the root cause of the crash is that a status cannot be set. In my opinion this is really not a sufficient reason for a crash. Why not initializing the status with a default value?

There is another status involved: the result.status. Default value is "NOT SET". Should be possible with context.test.status also.

I did not spend much time to go into detail here, I only tried a rough hack, I added an if condition:

if context.test:
   context.test.status = result.status

Effect: No status exception an more.

But sometimes:

Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stdout>'> at interpreter shutdown, possibly due to daemon threads
Python runtime state: finalizing (tstate=0000016FF0954FB0)

Current thread 0x000056a8 (most recent call first):
<no Python frame>

Uh, ugly!

OK, problem needs a better investigation than I did.

test-fullautomation commented 2 months ago

Hi Holger, as I wrote above. Would be nice not to have a crash. Thank you, Thomas

HolQue commented 2 weeks ago

Hi Holger,

did you pause the main thread?

Must be after last "END".

Thank you,l Thomas

Hi Thomas,

this would be helpful:

JOIN    TEST_THREAD_1
JOIN    TEST_THREAD_2

possible?

test-fullautomation commented 2 weeks ago

Hi Holger,

this is not wanted.

Plan is to wait in the main thread (for test e.g. some seconds), or to wait for notifications from other threads. E.g. mainthread can wait that THREAD_1 and THREAD_2 send a notification "finished".

Thank you, Thomas

test-fullautomation / robotframework

Threading causes crash #77