openhab / org.openhab.binding.zwave

openHAB binding for Z-Wave
Eclipse Public License 2.0
171 stars 202 forks source link

Workaround causes serial port discovery issues #1332

Open wborn opened 4 years ago

wborn commented 4 years ago

The binding currently uses serialPortManager.getIdentifiers() to get the SerialPortIdentifier.

This workaround was added in https://github.com/openhab/org.openhab.binding.zwave/pull/1222 by @bodiroga.

The workaround causes serial port detection issues because the getIdentifiers() method will only return discovered ports.

As a result:

Perhaps the workaround can be removed by instead detecting that a controller is disconnected by just opening the port and handling any resulting exceptions?

Related to:

RafalLukawiecki commented 4 years ago

I think calling getIdentifiers does not make sense for RFC2217, as I have alluded to in this comment.

Interestingly, there is code marked "TODO" that supposedly was going to do that, but I cannot believe it would be possible to ever implement it, see RFC2217PortProvider.java:

@Override
public Stream<SerialPortIdentifier> getSerialPortIdentifiers() {
    // TODO implement discovery here. https://github.com/eclipse/smarthome/pull/5560
    return Stream.empty();
}
wborn commented 4 years ago

I also think it's unlikely RFC2217 discovery will ever be implemented.

It would be interesting to know if ZWave with RFC2217 works for you without:

RafalLukawiecki commented 4 years ago

In my testing I have commented out this workaround, last weekend, and unfortunately it did not work. Binding was still showing "No such port exists" messages. I suppose it should be all re-tested without calls to the unsupported methods. I just don't know if I can invest more time into it—let me explain (perhaps I should make a more generic post about the difficulties on the OH forum).

Debugging OpenHAB core is too hard: it is very hard to know how the bits fit together whilst replacing and recompiling them. Maven and Java only make it harder by creating a huge barrier that could only be removed by better "welcome to a new developer" documentation on OH or more focused onboarding support for newcomers. I gave up after 35 hours invested into the ZWave binding and nrjavaserial. I am glad that my efforts yielded an important fix to nrjavaserial, but I was very disheartened that the ZWave binding maintainers were not as able to help me debug the issue as the developer of nrjavaserial was in fixing theirs.

I realise everyone is a volunteer and their work has been excellent. Probably due to other factors, unfortunately, the number of times I was told that the maintainer did not read or know the code that was closely related to the issue affecting their binding, or that it was the responsibility of some other openhab component to do things properly was too much for me. For a newcomer it is not encouraging to feel that I should be more committed to fixing an issue with a binding that its own maintainer wants to. It made me concerned that I would be on my own, going forward, without stronger support in the (hard) debugging effort. If this was not Java/Maven/Eclipse but something more discoverable and less byzantine, or even just a bit better documented, I would have persisted. But with the high barrier to entry for even a fairly experienced developer—admittedly, in many other languages and platforms—but with lack of stronger support, I am sorry to say, I had to give up.

I really wanted to be a helpful part of it. I am very sorry.

wborn commented 4 years ago

I think you did an excellent job to try to get it working @RafalLukawiecki as newcomer. Back when I started working on openHAB I also found it very difficult to understand how everything works together (even as a seasoned Java developer). What worked for me was fixing small issues in add-ons and learning my way around everything eventually.

For me the RFC2217 part is new and looks like you've uncovered quite a few issues with it. It probably has to do with that it's not used by many. Which is reinforced by the same issues of course.

Hopefully we do can get it fixed/improved eventually. After spending 35 hours and without a positive outlook to get it fixed it's OK to give up. Thanks for recompiling the FreeBSD nrjavaserial libraries and triggering me to have a look at the RFC2217 part. :+1: I'll continue to make some improvements for it.

bodiroga commented 4 years ago

Hi guys!

Sorry for being a little bit late to the party, but I will try to help fixing this bug and testing the proposed solution :+1:

As far as I remember, that workaround was introduced because the function serialPortManager.getIdentifier(portId); didn't return null when the USB controller was unplugged and, thus, making the binding believe everything was 'ok'. I know that it doesn't make any sense, but that was what happened on my computer and I explain it in the PR (see https://github.com/openhab/org.openhab.binding.zwave/pull/1222).

I will start up the Eclipse IDE, replace my function with the original one and test it again, let's see if things have changed (I hope so). I will also get a spare Raspberry Pi and test the RFC2217 functionality :crossed_fingers:

Many thanks to @RafalLukawiecki for his awesome work during this week (your job with the nrjavaserial library has been amazing) and @wborn for having a look at this topic :fireworks:

Best regards,

Aitor

wborn commented 4 years ago

That would be awesome! It seems RFC2217 is working better with these changes.

bodiroga commented 4 years ago

That would be awesome! It seems RFC2217 is working better with these changes.

Hi @wborn!

I'm sorry, but I'm still seeing the erroneous behavior after replacing getSerialPortIdentifier(portId) with serialPortManager.getIdentifier(portId): the function doesn't return null when the USB device is unplugged. As a consequence, the reconnection logic is broken :cry: I have waited 10 minutes and the function is still returning a non null value, so if it is cache problem, it keeps the value for a long time.

Which one is the function being executed when serialPortManager.getIdentifier(portId) is called? This one? Or is the function overwritten by the nrjavaserial library? That would explain why my workaround works :thinking:

Any idea?

Thank you guys!

bodiroga commented 4 years ago

Is this the function being executed?

wborn commented 4 years ago

Yes that's the one. Doesn't it throw exceptions that you can properly handle when you open the port when the device is unplugged?

bodiroga commented 4 years ago

Yes that's the one. Doesn't it throw exceptions that you can properly handle when you open the port when the device is unplugged?

No sir, openHAB crashes on:

SerialPortIdentifier commPort = portIdentifier.open("org.openhab.binding.zwave", 2000);

without throwing any exception. The only message I get in the Eclipse IDE console is /dev/ttyACM0: No existe el archivo o el directorio (/dev/ttyACM0: file or directory does not exist), but that seems to be a JVM message and not an openHAB one.

This is the reason why the workaround was added :cry:

wborn commented 4 years ago

Does it generate a hs_err_pid*.log file of the crash? If it is an issue with nrjavaserial we should report it or test if it is still an issue with 5.1.0 or newer which has some fixes for device removal.

It can also be related to https://github.com/NeuronRobotics/nrjavaserial/issues/144.

There's a similar issue in the zigbee code https://github.com/openhab/org.openhab.binding.zigbee/issues/577.

bodiroga commented 4 years ago

Does it generate a hs_err_pid*.log file of the crash?

No hs_err_pid.log file found after executing the command: ```sudo find / -name hs_err_pid```. Fu** :disappointed:

If it is an issue with nrjavaserial we should report it or test if it is still an issue with 5.1.0 or newer which has some fixes for device removal.

Yeah, I have been reading that issue in the nrjavaserial repo, but I have no clue how could I try the latest version of the library (5.1.1) with the Eclipse IDE app.bndrun launcher. Is it possible to add it editing the pom.xml file? I'm going to try to add this https://mvnrepository.com/artifact/com.neuronrobotics/nrjavaserial/5.1.1 to the pom file, let's see if I am lucky :smile:

Many thanks for your help @wborn, I really appreciate it :wink:

PS: No, I can not make it work, the core.io.transport.serial.* bundles depend on 3.x version of nrjavaserial library :thinking:

wborn commented 4 years ago

Yes that was to be expected. The easiest way to test it without updating/recompiling core is by manually updating the bundle manifest and replacing:

~/.m2/repository/org/openhab/nrjavaserial/3.15.0.OH2/nrjavaserial-3.15.0.OH2.jar

with this nrjavaserial-3.15.0.OH2.jar.

It's a hack, but it saves a lot of work. :wink:

After resolving the Demo App bundles, it should add nrjavaserial;version='[3.99.9,3.99.10)',\

You can also check the Felix Web Console that it is using that bundle:

http://localhost:8080/system/console/bundles

username/password: admin/admin

It should show:

reversioned

bodiroga commented 4 years ago

Hi @wborn!

It's a hack, but it saves a lot of work. :wink:

Great idea, I haven't thought about it :+1:

Unfortunately... openHAB keeps crashing after removing the Z-Wave USB stick, and the worst thing is that I fear that I might be doing something wrong, so I'm not sure if we should rise the error to the nrjavaserial repository :cry: I don't know if @RafalLukawiecki (or @cdjackson) has still enough energy to makes some tests with the USB device connected to the host PC instead of using the RFC2217 port :wink:

On the other hand, the feature introduced in the 5.1.0 release works great. After changing this to true and removing the long sleep here, I see the HARDWARE_ERROR event and I can use it to trigger the onSerialPortError function. But, on the next watchSerialPort execution, the binding tries to open the portIdentifier and openHAB crashes. I must be making some stupid mistake...

Anyway, I have not given up yet, so I will keep trying!

Thanks for your help!

PS: do I have to activate anything in the Eclipse IDE or the Demo App in order to get a .pid file for the JVM crash?

bodiroga commented 4 years ago

It can also be related to NeuronRobotics/nrjavaserial#144.

Just to make things more clear, this issue seems to be related to Windows, isn't it? (The file @Ranjith619 is suggesting to change is "src/main/c/src/windows/termios.c") In my case, my tests are being made in a Linux computer (Ubuntu 18.04 in a Dell XPS 13 developer edition) and, for the sake of completeness, the java -version command outputs:

openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-8u252-b09-1~18.04-b09)
OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)

Should I try a different JDK or is the used one correct?

wborn commented 4 years ago

Your JDK should be fine and I'm also on Ubuntu 18.04 and was also able to reproduce it with another USB stick.

It might be that the library encounters some error and just exits the whole process. If you look at fuserImp.c there are several exit(0) and exit(1) calls that may cause this.

wborn commented 4 years ago

OK I did some more testing and it seems that if you just discover the identifiers and then get the identifier it doesn't crash. Can you test if that also solves the issue for you @bodiroga?

serialPortManager.getIdentifiers();
serialPortManager.getIdentifier(...);

In a very simple example it solves the issue for me.

wborn commented 4 years ago

I've created https://github.com/NeuronRobotics/nrjavaserial/issues/180 for this.

bodiroga commented 4 years ago

Ummmmm... two strange findings from my side:

Thanks again for your work, we are closer to the final solution :wink:

PS: Wait, the second case only occurs if I print the idents object size. Here you can see what I have added:

Stream<SerialPortIdentifier> idents = serialPortManager.getIdentifiers();
logger.error("identifiers size: {}", idents.count()); // Delete
SerialPortIdentifier portIdentifier = serialPortManager.getIdentifier(portId);
logger.error("Port identifier: {}", portIdentifier); // Delete

With just this:

Stream<SerialPortIdentifier> idents = serialPortManager.getIdentifiers();
SerialPortIdentifier portIdentifier = serialPortManager.getIdentifier(portId);
logger.error("Port identifier: {}", portIdentifier); // Delete

openHAB keeps crashing, just as before.

wborn commented 4 years ago

Can you try this reversioned nrjavaserial 5.2.1 in this nrjavaserial-3.15.0.OH2.jar? It has the latest fixes for unplugging/reinserting USB sticks.

bodiroga commented 4 years ago

Hi @wborn!

I'm not sure if things have improved, this is the current situation:

Every 30 seconds (when the watchSerialPort is executed), the serialPort.getIdentifier(portId) function returns the correct identifier and null in turns: correct, null, correct, null, correct, null... Perhaps I could change the watchSerialPort logic and only call it if after a disconnect is detected, but I didn't want to change the reconnect strategy so much. I will make some tests first, but I'm not sure how to proceed when the behaviour changes in every new vesion :smile:

Many thank for your work!

wborn commented 4 years ago

When the port is opened with the newer nrjavaserial versions, it will no longer be part of the discovered ports. I noticed that change in behavior as well, see https://github.com/NeuronRobotics/nrjavaserial/issues/166#issuecomment-613491695. That change is due to the fact that the OH nrjavaserial fork didn't use lockfiles. So you cannot use the discovered ports to determine if a device is still connected.