hardbyte / python-can

The can package provides controller area network support for Python developers
https://python-can.readthedocs.io
GNU Lesser General Public License v3.0
1.31k stars 604 forks source link

Segmentation fault when calling ics.close_device() #1740

Closed robingergg closed 7 months ago

robingergg commented 9 months ago

Describe the bug

Calling this function of can.interfaces.ics_neovi.neovi_bus causes Segmentation Fault: def shutdown(self): super().shutdown() ics.close_device(self.dev)

To Reproduce

Having a neoVI Fire 2 device transmitting CAN signals I have the next TC:

*** Settings ***
Resource            ${CURDIR}/../ResourceFile.resource

Test Setup          Set CAN    ${INTERFACE}    ${CHANNEL}    ${BITRATE}    ${DB_FILE}    ${TEST_NAME}    ${recv_own_msg}
Test Teardown       Shut Down Bus

*** Test Cases ***
Test Reciveing Message
    [Tags]    Test-1
    Run Keyword And Continue On Failure    Step 1

*** Keywords ***
Step 1
    Check The Reception Of ${MESSAGE_NAME} TimeOut ${TIMEOUT} Seconds Sending To ${TARGET_DEVICE}

This is a simple test case which uses this robot-framework library: https://github.com/Openwide-Ingenierie/robotframework-can-uds-library

The problem occurs when the code reaches the Test Teardown. Inside it this method will be called from the above mentioned robot-framework lib.: def stop_bus(self): """Stop the CAN BUS""" self.bus.shutdown()

At the end it produces Segmentation Fault.

Expected behavior

The expected behavior is to shut down the previously opened bus without a problem.

Additional context

OS and version: Windows, 10 Python version: Python 3.10.12 python-can version: python-can==4.2.2 python-can interface/s (if applicable): Interfaces is involved

When using XCP - pyxcp, the Segmentation Fault can be avoided by sleeping 5 seconds before the attempt of closing the device. However with the provided test case above, it occurs no matter what.

Traceback and logs ```python def func(): return "hello, world!" ```
pierreluctg commented 9 months ago

Hi @8Klaro8, can you please share the version of python-ics that you are using and run your code with the PYTHONFAULTHANDLER environment variable set (https://docs.python.org/3/library/faulthandler.html) and share the stack trace?

robingergg commented 9 months ago

Hi @pierreluctg ! Of course. Below are the requested infos: NOTE: I ran 3 times and thus there are 3 logs of faulthandler. They are separated with 1.) run, 2.) run etc..

python-ics version:

faulthandlers log

Run 1

Windows fatal exception: access violation

Current thread 0x00002d80 (most recent call first):
  File \MYAPTH\Python\Python311\site-packages\can\interfaces\ics_neovi\neovi_bus.py", line 321 in _process_msg_queue
  File \MYAPTH\Python\Python311\site-packages\can\interfaces\ics_neovi\neovi_bus.py", line 408 in _recv_internal
  File \MYAPTH\Python\Python311\site-packages\can\bus.py", line 121 in recv
  File \MYAPTH\Python\Python311\site-packages\can\notifier.py", line 124 in _rx_thread
Windows fatal exception:   File access violation

Run 2

Windows fatal exception: access violationWindows fatal exception: access violation

Thread 0x00003718 (most recent call first):
  File \MYAPTH\Python\Python311\site-packages\can\interfaces\ics_neovi\neovi_bus.py", line 321 in _process_msg_queue
  File \MYAPTH\Python\Python311\site-packages\can\interfaces\ics_neovi\neovi_bus.py"

Windows fatal exception: access violation

Run 3

Windows fatal exception: access violationCurrent thread 0x00002f98

 (most recent call first):
  File \MYAPTH\Python\Python311\site-packages\can\interfaces\ics_neovi\neovi_bus.py", line 321 in _process_msg_queue
  File \MYAPTH\Python\Python311\site-packages\can\interfaces\ics_neovi\neovi_bus.py", line 408 in _recv_internal
  File "

`

robingergg commented 9 months ago

@pierreluctg my conclusion so far is that the device(Neovi in this case) has been already closed by the time can.interface.ics_neovi.neovi_bus wants to close it using its shutdown() and thus it gives the Segmentation fault.

Uncommenting the "ics.close_device()" in neovi_bus.py avoids the Segmentation fault.

Reading the documentation of ics says: _". Also since python is a object oriented language the module utilizes this and auto cleans up device handles when going out of scope so there is usually no need to call ics.closedevice()."

This may explain, why it is causing Segfault.

pierreluctg commented 9 months ago

@8Klaro8 looks like you may have a variable scoping issue in your code.

Are you able to share some simple (few line) code that can reproduce the issue?

robingergg commented 9 months ago

@pierreluctg with the provided library above: https://github.com/Openwide-Ingenierie/robotframework-can-uds-library

And the provided robot code produces this error. Just by simply setting up a robot test where we check a certain CAN signal thru neoVI device. The configuration looks as follows when setting up the CAN bus in this line: Set CAN ${INTERFACE} ${CHANNEL} ${BITRATE} ${DB_FILE} ${TEST_NAME} ${recv_own_msg} Config:: interface: neovi channel: HSCAN2 bitrate: 500000 db_file: test.dbc test_name: mytest recv_own_msg: False

pierreluctg commented 8 months ago

Hi @8Klaro8, I still suspect that you have a variable (the can bus object) scope issue in your Robot Framework lib. Unfortunately, debugging your Robot Framework library is out of the scope of this project. If you are able to write a simple Python script that expose the issue(s), I will be happy to have a look.

robingergg commented 7 months ago

Hi @pierreluctg, I did not confirm it, but I assume the problem cause by the neovi bus implementation by python-can. Based on the docs of the neovi bus, the neovi device shall be closed "automatically" yet the implementatino of neovi bus closes the bus when shutting down the can bus, therefore it gave segmentation fault.

pierreluctg commented 7 months ago

@8Klaro8, yes the neovi device handle get automatically closed when getting out of scope. This do not mean that closing the device handle ( via ics.close_device) is not allowed and will cause a segmentation fault.

robingergg commented 7 months ago

@pierreluctg I see. Interesting.

I monkeypatched the shutdown method of neovi_bus.py and the segfault dissapeared (I am skipping the calling of ics.close_device()) and the bus closes without a problen (automatically).

pierreluctg commented 7 months ago

The only thing this is demonstrating is if your code avoids calling the bus shutdown method, you do not get in that issue...

pierreluctg commented 7 months ago

This can reproduce what you are seeing...

import time

import can

with can.interface.Bus(interface="neovi", channel=1) as bus:
    notifier = can.Notifier(bus, [can.Printer()])
# The bus is now closed
time.sleep(10)
pierreluctg commented 7 months ago

You can also try with, for example, the virtual interface. You still get an issue with your code I suspect.

Instead of a segmentation fault you will get can.exceptions.CanOperationError: Cannot operate on a closed bus.

I agree that getting an exception is better than a segmentation fault, we can fix that in the neovi interface. However your issue remain the same.

pierreluctg commented 7 months ago

@8Klaro8 maybe your code is not closing the notifier (notifier.close()) prior to closing the bus?

pierreluctg commented 7 months ago

Added fix in #1765 to avoid the segmentation fault

grant-allan-ctct commented 7 months ago

Sorry to be late to the party.

I can see it making sense to raise an error if someone tries to send a message and the CAN bus has been closed. But if it's just processing of an incoming message from off the bus, then maybe an exception need not be raised.

One of the files wrapped with the new wrapper function is _process_msg_queue, which looks like it has underlying design principle of politely returning None, False if the processing is unsuccessful. Perhaps we could follow suit, and politely return None, False if self._is_shutdown is True?

One thing that I wonder about is that by introducing a new exception to an internal function, we now have (as a side effect) introduced the possibility of that exception to all of the consumers of that internal function. This could potentially have downstream effect upon the documentation for all of those consumers?

Anyway, just a thought...