openhab / org.openhab.binding.zwave

openHAB binding for Z-Wave
Eclipse Public License 2.0
171 stars 202 forks source link

ZWave binding can completely kill openHAB #350

Closed bodiroga closed 7 years ago

bodiroga commented 7 years ago

Hi Chris!

First of all, here you have my installation system data:

What I want to report is that I'm able to completely kill my openHAB2.0 installation following 4 simple steps and, as I understand the OSGI structure, this should not happen as every bundle should be isolated. My steps to reproduce the problem are:

I understand that this is a very abnormal situation, but as I have told you a couple of times, I'm playing with distributed ZWave networks (more than one ZWave controller attached to different Rasberry Pis that, at the same time, share their serial port with a central Raspberry Pi using socat -serial to IP-) and removing the controller from the USB port is what I'm using to simulate the failure of one of the non-central Raspberry Pi-s. Leaving the binding running while the controller is unplugged is another option, but I have seen that sometimes this increases the CPU usage to a 100%. That's why I was following the bundle:stop and bundle:start path.

I don't know if the problem is with the binding or the underlying java serial library, but the binding should't "be able" to kill the whole program, that's really strange.

Many thanks for your awesome work and if you need more information don't hesitate to ask.

Best regards,

Aitor

cdjackson commented 7 years ago

I'm not sure what I can do about this. It sounds like a low level problem if the JVM is crashing so maybe it's related to aerial drivers?

I don't think I can help much though - it's not something I've heard of before. Sorry.

Sent from my iPhone

On 7 Feb 2017, at 09:26, Aitor Iturrioz notifications@github.com wrote:

Hi Chris!

First of all, here you have my installation system data:

Machine: Raspberry Pi 3 Model B using openHABian OH2 Version: 2.0.0 RELEASE Binding Version (if SNAPSHOT, provide compile time as found on the filename in the console): 2.0.0 RELEASE What I want to report is that I'm able to completely kill my openHAB2.0 installation following 4 simple steps and, as I understand the OSGI structure, this should not happen as every bundle should be isolated. My steps to reproduce the problem are:

Step 1: Start openHAB2.0 with the ZStick Gen5 attached (The usb stick has 5 nodes in the network, all of them imported as Things and with most of the channels enabled). Step 2: Remove the ZStick Gen5 from the USB port. Step 3: Stop the ZWave binding through the Karaf console using the command: bundle:stop zwave_id. Step 4: Start the ZWave binding again using the command: bundle:start zwave_id. Step 5: Boooom! You are kicked out from the Karaf console and, in the openHABian distribution case, openHAB2.0 is restarted. I understand that this is a very abnormal situation, but as I have told you a couple of times, I'm playing with distributed ZWave networks (more than one ZWave controller attached to different Rasberry Pis that, at the same time, share their serial port with a central Raspberry Pi using socat -serial to IP-) and removing the controller from the USB port is what I'm using to simulate the failure of one of the non-central Raspberry Pi-s. Leaving the binding running while the controller is unplugged is another option, but I have seen that sometimes this increases the CPU usage to a 100%. That's why I was following the bundle:stop and bundle:start path.

I don't know if the problem is with the binding or the underlying java serial library, but the binding should't "be able" to kill the whole program, that's really strange.

Many thanks for your awesome work and if you need more information don't hesitate to ask.

Best regards,

Aitor

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

bodiroga commented 7 years ago

Hi again Chris!

Yeah, I understand, that's not something the binding should care about, it's too low level.

Have you ever thought about catching the "Got I/O exception Input/output error in writeArray during sending. exiting thread." messages to set the controller thing Offline? Or create a channel that tells the user if the communication between the binding and the controller is active and running? I know that it occurs very rarely, but sometimes in devices like the Raspberry Pi the USB ports are not feed up correctly and the controller hangs, and unless you create a script that checks the logs periodically, it's very difficult to automatically detect that type of errors. Having that done in the binding would be awesome because the user could easily create rules to act accordingly ;) More channels could also be added to the controller thing: healing active or not, queue length,... but I also know that adding too many channels is not the best idea.

Thanks again for your fast answer and I'm looking forward to test the new security command class once I buy a ZWave lock!

Best regards,

Aitor

cdjackson commented 7 years ago

I think we'll close this - if the problem is with the serial driver errors, or something else low level in the JVM, then there's no much I can do.

Feel free to open a request for the handling of the IO error though - we should either take the controller offline as you suggest, or just swallow the error and continue working (which is probably best?).