home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.94k stars 30.99k forks source link

Threading issue with mcp23017 platform when used by both binary_sensor and switch integration #31867

Closed jpcornil-git closed 2 years ago

jpcornil-git commented 4 years ago

The problems

When a mcp23017 platform is instantiated by e.g. a _binarysensor and a switch component (running in different threads):

I fixed the issue in my view by adding (See Additional information section below):

Note that issue may be generic/affect others components sharing a common platform

Environment

Problem-relevant configuration.yaml

binary_sensor:
  - platform: mcp23017
    i2c_address: 0x27
    scan_interval: 1
    invert_logic: true
    pins:
      8 : Button_0
      9 : Button_1
      10: Button_2
      11: Button_3
      12: Button_4
      13: Button_5
      14: Button_6
      15: Button_7

switch:
  - platform: mcp23017
    i2c_address: 0x27
    pins:
      0 : Sortie_0
      1 : Sortie_1
      2 : Sortie_2
      3 : Sortie_3
      4 : Sortie_4
      5 : Sortie_5
      6 : Sortie_6
      7 : Sortie_7

Traceback/Error logs

None

Additional information

/srv/homeassistant/lib/python3.7/site-packages/adafruit_mcp230xx.py:

class MCP23017: """Initialize MCP23017 instance on specified I2C bus and optionally at the specified I2C address. """ mutex = threading.Lock() isInit = False

def __init__(self, i2c, address=_MCP23017_ADDRESS):
    self._device = i2c_device.I2CDevice(i2c, address)
    if not MCP23017.isInit:
        MCP23017.isInit = True
        # Reset to all inputs with no pull-ups and no inverted polarity.
        self.iodir = 0xFFFF
        self.gppu = 0x0000
        self._write_u16le(_MCP23017_IPOLA, 0x0000)

...

/srv/homeassistant/lib/python3.7/site-packages/homeassistant/components/mcp23017/binary_sensor.py:
- Protection of all MCP23017 accesses by a mutex
```python
...

def setup_platform(hass, config, add_devices, discovery_info=None):
    """Set up the MCP23017 binary sensors."""
    pull_mode = config[CONF_PULL_MODE]
    invert_logic = config[CONF_INVERT_LOGIC]
    i2c_address = config[CONF_I2C_ADDRESS]

    i2c = busio.I2C(board.SCL, board.SDA)
    with adafruit_mcp230xx.MCP23017.mutex:
        mcp = adafruit_mcp230xx.MCP23017(i2c, address=i2c_address)

    binary_sensors = []
    pins = config[CONF_PINS]

    for pin_num, pin_name in pins.items():
        pin = mcp.get_pin(pin_num)
        binary_sensors.append(
            MCP23017BinarySensor(pin_name, pin, pull_mode, invert_logic)
        )

    add_devices(binary_sensors, True)

class MCP23017BinarySensor(BinarySensorDevice):
    """Represent a binary sensor that uses MCP23017."""

    def __init__(self, name, pin, pull_mode, invert_logic):
        """Initialize the MCP23017 binary sensor."""
        self._name = name or DEVICE_DEFAULT_NAME
        self._pin = pin
        self._pull_mode = pull_mode
        self._invert_logic = invert_logic
        self._state = None

        with adafruit_mcp230xx.MCP23017.mutex:
            self._pin.direction = digitalio.Direction.INPUT
            self._pin.pull = digitalio.Pull.UP

    @property
    def name(self):
        """Return the name of the sensor."""
        return self._name

    @property
    def is_on(self):
        """Return the state of the entity."""
        return self._state != self._invert_logic

    def update(self):
        """Update the GPIO state."""
        with adafruit_mcp230xx.MCP23017.mutex:
            self._state = self._pin.value

/srv/homeassistant/lib/python3.7/site-packages/homeassistant/components/mcp23017/switch.py:


def setup_platform(hass, config, add_entities, discovery_info=None):
    """Set up the MCP23017 devices."""
    invert_logic = config.get(CONF_INVERT_LOGIC)
    i2c_address = config.get(CONF_I2C_ADDRESS)

    i2c = busio.I2C(board.SCL, board.SDA)

    with adafruit_mcp230xx.MCP23017.mutex:
        mcp = adafruit_mcp230xx.MCP23017(i2c, address=i2c_address)

    switches = []
    pins = config.get(CONF_PINS)
    for pin_num, pin_name in pins.items():
        pin = mcp.get_pin(pin_num)
        switches.append(MCP23017Switch(pin_name, pin, invert_logic))
    add_entities(switches)

class MCP23017Switch(ToggleEntity):
    """Representation of a  MCP23017 output pin."""

    def __init__(self, name, pin, invert_logic):
        """Initialize the pin."""
        self._name = name or DEVICE_DEFAULT_NAME
        self._pin = pin
        self._invert_logic = invert_logic
        self._state = False

        with adafruit_mcp230xx.MCP23017.mutex:
            self._pin.direction = digitalio.Direction.OUTPUT
            self._pin.value = self._invert_logic

    @property
    def name(self):
        """Return the name of the switch."""
        return self._name

    @property
    def should_poll(self):
        """No polling needed."""
        return False

    @property
    def is_on(self):
        """Return true if device is on."""
        return self._state

    @property
    def assumed_state(self):
        """Return true if optimistic updates are used."""
        return True

    def turn_on(self, **kwargs):
        """Turn the device on."""
        with adafruit_mcp230xx.MCP23017.mutex:
            self._pin.value = not self._invert_logic
        self._state = True
        self.schedule_update_ha_state()

    def turn_off(self, **kwargs):
        """Turn the device off."""
        with adafruit_mcp230xx.MCP23017.mutex:
            self._pin.value = self._invert_logic
        self._state = False
        self.schedule_update_ha_state()
probot-home-assistant[bot] commented 4 years ago

Hey there @jardiamj, mind taking a look at this issue as its been labeled with a integration (mcp23017) you are listed as a codeowner for? Thanks!

Misiu commented 4 years ago

@jpcornil-git I'm working on PCF8574 integration, so your finding is very useful. I'll wait to see what others have to say about this problem. Do you have other hardware to test this problem? pcal9535a?

jpcornil-git commented 4 years ago

No I don't but can help eventually, e.g. add debug code to expose the issue/thread activities as I did for my MCP23017-based board (https://github.com/jpcornil-git/RPiHat_GPIO_Expander) Cheers jpc

Misiu commented 4 years ago

@jpcornil-git thank you for the link! Let's wait for others to look into that issue.

jardiamj commented 4 years ago

@StephenBeirlaen, this is most likely the reason we were having issues using the INTFLAG for implementing the Interrupt logic into the component. In order to implement this changes to the Adafruit_CircuitPython_MCP230xx would need to be done through a PR first. Adafruit's i2c_device implementation is designed so you can lock the i2c bus and device address: https://github.com/adafruit/Adafruit_CircuitPython_BusDevice/blob/master/adafruit_bus_device/i2c_device.py But their implementation of the MCP230xx only locks the device address on every operation (read/write): https://github.com/adafruit/Adafruit_CircuitPython_MCP230xx/blob/master/adafruit_mcp230xx/mcp230xx.py Some sort of refactoring of the library might be needed, but I haven't put that much thought into it yet. Any suggestions?

jpcornil-git commented 4 years ago

My changes aren't generic enough; I would protect/lock by I2C address ideally (when you have multiple identical devices at different addresses you don't want to block access for all of them), same with the initialization (you want to initialize all of them but only once not everytime a new instance of the same {device, address} is made)

StephenBeirlaen commented 4 years ago

Nice findings, it looks promising! I can take a closer look tomorrow.

StephenBeirlaen commented 4 years ago

Very nice and detailed post @jpcornil-git! I also found similar problems where multiple I2C signals appeared to 'collide'. I tried to document my findings (with a logic analyser measurement) here: https://github.com/jardiamj/homeassistant/issues/1#issuecomment-532432071

I now also see why multithreading could be the issue, previously I was confused by the locks already present in the Adafruit code. I could not explain how they weren't sufficient, since I lack some more Python experience to debug this.

My particular project uses switches to momentarily turn on outputs, and these outputs controls pulse relays (more or less, for simplification). The output of the relays are fed back into the MCP as inputs (binary_sensors), to check the current state of the pulse relay (since they are pulsed to toggle them on/off, I have to verify the current state). So it's very similar to RPiHat_GPIO_Expander.

Since the measurement of an input change happens directly after an output changed its state, this leads to simultaneous I2C traffic all the time. That means I can easily reproduce the issue :)

Now I am unsure how we should fix this (universally as @jpcornil-git means). As I understand it right now, the proposed changes above are a global lock, that blocks the I2C bus for every device, not per I2C address? So two I2C devices with different addresses cannot be communicated with in parallel?

I would protect/lock by I2C address ideally (when you have multiple identical devices at different addresses you don't want to block access for all of them)

Wouldn't allowing this make it possible for multiple processes to use the I2C bus at the same time, and potentially mix up two I2C transmissions?

jpcornil-git commented 4 years ago

The Adafruit code takes a lock of the I2C bus for a single R/W transaction (to protect that transaction) and this is fine at that level and not hurting (the bus is used for that transaction/can't be used for anything else as i2c is serial, i.e. you can't communicate in parallel).

The issue is when two threads are doing read-modify-write actions, e.g.:

In the MCP23017BinarySensor and MCP23017Switch __init__ code, you have: self._pin.direction = digitalio.Direction.OUTPUT

The above calls the direction setter method from DigitalInOut class (defined in adafruit_mcp230xx/digital_inout.py)

self._mcp.iodir = _clear_bit(self._mcp.iodir, self._pin)

That line translates into two I2C transactions:

If between these two transactions, another thread is updating the same IODIRA register (because not waiting on a lock) it will be lost/overwritten by the 2nd transaction.

To solve the issue and make such read-modify-write transaction atomic, you have to take the lock on the device for the whole sequence -> need a lock per device instance or per address (one device instance per address).

Issue is not very visible because most functions do a single read or write (but not the case during the platform setup phase)

The second issue to solve is the initialization of the device when multiple instances of the same devices are created (in different threads), it should only happen once as second initialisation may overwrite configuration of the thread that is already ahead/configuring the device (above lock won't address that)

stale[bot] commented 4 years ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment πŸ‘ This issue now has been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

fjufju commented 4 years ago

@jpcornil-git any updates? Have problem. Thank you.

jpcornil-git commented 4 years ago

Hi Fumi,

I still plan to address this issue, just need to find time, hopefully during this summer (the plan at least), as this is a complex one to address/resolve.

Cheers, jpc

Le dim. 24 mai 2020 Γ  11:46, Fumi Ju notifications@github.com a Γ©crit :

@jpcornil-git https://github.com/jpcornil-git any updates? Have problem. Thank you.

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/home-assistant/core/issues/31867#issuecomment-633205640, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWC526BZPGEQ3EELKUC7M3RTDUHFANCNFSM4KV4FMSQ .

stale[bot] commented 4 years ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment πŸ‘ This issue now has been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

jpcornil-git commented 4 years ago

I still plan to work on this/fix this properly as soon as I can find some time Cheers jpc

fjufju commented 4 years ago

Looking forward to that :)

gcs278 commented 4 years ago

Thanks @jpcornil-git. For everyone else finding this thread, I'm still patching in what @jpcornil-git provided earlier, but I think I made a couple of tweaks because of library-related errors (can't remember what).

Here is the files that I'm currently using (synced as of HA Core 0.115.2): mcp23017_patches.zip

There's a patch script in the zip that patches the homeassistant docker container (you need SSH access). It should ask you to vimdiff the files to confirm the changes (I'm using my mac with samba mounts to diff) then it will patch in the files. Diffs should be good for 0.115. There might be a syntax error with the patch script, good luck and your mileage may vary!

github-actions[bot] commented 3 years ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment πŸ‘ This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

jpcornil-git commented 3 years ago

This issue/resolution is pending review of #42928 pull request Cheers jpc

StephenBeirlaen commented 3 years ago

Nice work @jpcornil-git! Did not know about the PR yet. I'll see if I can get my hardware working again and test it out.

gralin commented 3 years ago

I will be happy to help test it too. I have been having some issues which might be caused by threading. All my light switches are connected via MCP23017 inputs and my lights are controlled by MCP23017 outputs. Once in a while it happens that clicking one light switch activates all outputs in MCP23017 causing lights in whole house to turn on πŸ˜… I didn't monitor what is being sent to the module but I assume HA is sending something like 0xFF instead of just one bit for the right output and this is the effect. I'm curious if anyone could confirm that this could happen using the current source code or if it could be prevented using the code from PR. The version I use is modified because I needed to support interrupt pin to get better response time, and this interrupt pin has chained interrupt pins from all my MCP23017 modules, and this scenario was not supported out of the box. But I guess for this I can open separate issue/disucssion.

Anyway, thank you @jpcornil-git for your time and contribution, I hope I can at least help with testing since I'm not an experienced Python developer (I'm .NET).

jpcornil-git commented 3 years ago

... and I'll be happy to support too/great to have tests on different platforms :-).

@gralin I guess that with this implementation you won't need interrupt either (polling is happening in a dedicated thread/100ms period, i.e. reactive and not loading HA). I did this because of async refactoring but also because current polling implementation within HA is way too slow for a good user experience when inputs are used for push button, e.g. switching on a light/

jpcornil-git commented 3 years ago

For people affected by this issue (or willing to test/use a config flow/lower latency implementation already) and in order to ease testing, I made a 'custom_components version' available under https://github.com/jpcornil-git/HA-mcp23017.

You will need HA2021.2 or later (need #44238 fix) to test it.

Cheers jpc

jpcornil-git commented 3 years ago

Added HACS support to my https://github.com/jpcornil-git/HA-mcp23017 repository as well => Just add https://github.com/jpcornil-git/HA-mcp23017 to HACS custom repositories to try it out

github-actions[bot] commented 3 years ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment πŸ‘ This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

jpcornil-git commented 3 years ago

Issue is still there/component hasn't been updated since then but a fix is available as a custom_component or a HACS integration here: https://github.com/jpcornil-git/HA-mcp23017 Cheers, jpc

github-actions[bot] commented 3 years ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment πŸ‘ This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

jpcornil-git commented 3 years ago

Issue is still there/component hasn't been updated since then ... it looks like there is no support/owner for this integration anymore Cheers, jpc

github-actions[bot] commented 2 years ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment πŸ‘ This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

jpcornil-git commented 2 years ago

Issue is still present but given that the mcp23017 integration will be deprecated soon cfr. https://github.com/home-assistant/core/pull/62484, issue should be closed.

As mentioned above, another version of the mcp23017 integration without that issue will remain available as a custom_component/HACS integration under https://github.com/jpcornil-git/HA-mcp23017

Cheers, jpc

thecode commented 2 years ago

Closing this as the code for this integration is already removed from dev branch. I suggest to create a similar issue in ttps://github.com/jpcornil-git/HA-mcp23017

jpcornil-git commented 2 years ago

Just to correct that https://github.com/jpcornil-git/HA-mcp23017 doesn't suffer from this issue