Closed jpmens closed 8 years ago
The following sketch is basically a copy of your LightOnOff
example. You will notice that I added a second node called ping
which doesn't do terribly much.
#include <Homie.h>
const int PIN_RELAY = 16;
HomieNode lightNode("light", "switch");
HomieNode pingNode("ping", "pong");
bool lightOnHandler(String message) {
if (message == "true") {
digitalWrite(PIN_RELAY, HIGH);
Homie.setNodeProperty(lightNode, "on", "true"); // Update the state of the light
Serial.println("Light is on");
} else if (message == "false") {
digitalWrite(PIN_RELAY, LOW);
Homie.setNodeProperty(lightNode, "on", "false");
Serial.println("Light is off");
} else {
return false;
}
return true;
}
bool pingHandler(String message) {
Serial.println("ping here");
Homie.setNodeProperty(pingNode, "pong", "Bonjour!");
return true;
}
void setup() {
pinMode(PIN_RELAY, OUTPUT);
digitalWrite(PIN_RELAY, LOW);
Homie.setFirmware("sub2-relay", "1.0.0");
lightNode.subscribe("on", lightOnHandler);
pingNode.subscribe("on", pingHandler);
Homie.registerNode(lightNode);
Homie.registerNode(pingNode);
Homie.setup();
}
void loop() {
Homie.loop();
}
When run over a plain MQTT connection it works. When running over an TLS connection the device disconnects and reconnects randomly and seemingly doesn't react at all (neither to light nor to ping).
I'm hoping this can help you diagnose the issue.
The issue happening obviously because of the heap, I'll try to remove most of the dynamic allocations and instead allocate on the stack. The good thing is I am learning a lot resolving this issue!
Could you please try again? It seems stable on my side after 2be5da01fa6537d5fe7db4fae963fc1a470c4f16
I'm sorry to disappoint you, but no change here. In plain mode all works fine as far as I can tell. In TLS mode, the moment I try to talk to the device (e.g. relay on), it disconnects.
So, testing with the above sketch on TLS again, this time a bit more carefully. The screenshot shows what happens initially:
In fact, I don't even have to try and switch "light on"; the device disconnects by itself quite periodically, whereupon the broker logs
1458801612: Sending SUBACK to Homie-cf3a07e0
1458801612: OpenSSL Error: error:14094410:SSL routines:SSL3_READ_BYTES:sslv3 alert handshake failure
1458801612: Socket error on client Homie-cf3a07e0, disconnecting.
The first couple minutes of the socket errors seen on the broker in human readable times show this happens about every 5 seconds:
1458801612 Thu Mar 24 07:40:12 2016
1458801617 Thu Mar 24 07:40:17 2016
1458801623 Thu Mar 24 07:40:23 2016
1458801673 Thu Mar 24 07:41:13 2016
1458801703 Thu Mar 24 07:41:43 2016
1458801704 Thu Mar 24 07:41:44 2016
1458801709 Thu Mar 24 07:41:49 2016
1458801714 Thu Mar 24 07:41:54 2016
1458801719 Thu Mar 24 07:41:59 2016
If I comment out the two lines in above sketch:
// pingNode.subscribe("on", pingHandler);
// Homie.registerNode(pingNode);
the device is much more stable: after the initial disconnect it stays up and I can talk to it.
There are still some dynamic allocations done:
A solution would be to completely avoid these dynamic allocations, but it would mean setting hard limits in the code:
... etc.
If this does not solve the issue, then the problem is that the framework is oversized for the ESP8266
I think it's a case of memory being cloberred somewhere ...
When you said Limit the max friendly $name of a device, I thought "Hey!", and changed $name
from Livingroom bookshelf
to N1
. What happens now, is
1458815935: New connection from 192.168.1.193 on port 9993.
followed by this:
though not readily reproducible.
I've been going backwards in largish steps to commit https://github.com/marvinroger/homie-esp8266/commit/f6ee49ab532a4e30274e52d5c14dfda1d484e11b to see if this has anything to do with me showing up on the scene here. It doesn't. :smile:
All the time we've spoken so far, I was testing with a Wemos-D1 mini. I didn't really expect it would make a difference, but the NodeMCU (Amica) behaves identically: the above sketch just won't behave properly.
To be honest, when I implemented SSL I never tested it. SSL encryption takes up more than a quarter of the available heap, so that does not surprise me.
Does the following information help at all? Of course I realize I'm comparing apples and potatoes; I'm just trying to be helpful...
Screenshot of memory requirements for sub2.ino
(the small sketch above):
Screenshot of memory requirements for a simple MQTT over TLS using Imroy/pubsubclient:
And you are helpful! But this piece of information shows stack usage, not heap, which is our problem here. I am working on moving everything to the stack in a clean way. The problem is this will be a breaking change.
Well, in my possibly not very humble opinion, you should go ahead and break it so that it's fixed! :smile:
You're right!
It's not a breaking change, so good news, and the problematic sketch seems to work. Could you please test?
By the way, maybe fingerprint is fixed too?
The good news is, it is much better, the bad news is, it is not good. I have the following observations:
sub2.ino
as shown above) actually works, and I was able to "ping" it.Looking slightly closer, I notice that you've changed the way subscriptions work. Contrary to previously where individual subscribes were issued, I now see a single wild-card subscription:
1459003383: Received SUBSCRIBE from Homie-cf3a07e0
1459003383: sensors/homie/cf3a07e0/# (QoS 1)
I think Homie is literally killing itself! My previous experience with PubSubClient on Arduino shows that if I send out a few messages "en block" it just dies. In this particular case, Homie is sending out several publishes, eg. during setupHandler, and they're being bounced back to the device; I think it then just dies.
I will try, time permitting, to observe this a little closer, but I believe that ought to be the reason...
Be that as it may, congrats so far!
What would probably help a bit to mitigate the issue is to narrow-down the wildcard to something like this:
sensors/homie/cf3a07e0/+/+/set
This would prevent most publishes from a Homie device being sent back to the same Homie device. I'm 3/4 or 7/8 convinced the device is hanging itself up due, not to your code, but to PubSubClient.
You're right, the wildcard subscription change is due to an issue with PubSubClient when subscribing to 5 topics or more. Subscribing to only /set
messages, /$ota
and /$reset
would reduce the load. I'll try next week, thanks for the great suggestion!
I've had a Homie device running since my last comment (approx 24h) on TLS, and it's still publishing three messages every 10s:
sensors/homie/cf3a07e0/temperature/degrees 22.50
sensors/homie/cf3a07e0/temperature/freeheap 11504
sensors/homie/cf3a07e0/ping/epoch 1459150716
I do note, however, that it periodically disconnects and then reconnects. The broker logs
1459150544: OpenSSL Error: error:14094410:SSL routines:SSL3_READ_BYTES:sslv3 alert handshake failure
1459150544: Socket error on client Homie-cf3a07e0, disconnecting.
I can't optimize things more than this, if it's still an issue, it has to be solved in PubSubClient or in the Arduino for ESP8266 SSL implementation. :confused:
I have an idea. By default, the ESP runs at 80MHz. I have seen projects running at 160MHz when using SSL. With twice the power of calculation, maybe the handshake failure
will disappear, and the touch and go
you're describing might disappear... What do you think?
As long as the chip then doesn't burn a hole through my desk, I'd say "let's try it"! ;-)
Could you please try your "heavy" sketch?
One word: SHIP IT! (Ok, those were two words, but you get the drift.)
You've done a tremendous job, thank you.
I would like to recommend adding something like this to the Wiki:
Adding a TLS fingerprint effectively pins the device to a particular certificate. Furthermore, as currently implemented by WifiSecureClient(), both mqtt.host
and ota.host
are verified against the server certificate's common name (CN
) in the certificate subject or in the SANs (subjectAlternateName) contained in it, but not in their IP addresses. So, if the certificate used by your server looks like
Subject: CN=tiggr.example.org, OU=generate-CA/emailAddress=nobody@example.net
...
X509v3 Subject Alternative Name:
IP Address:192.168.1.10, DNS:broker.example.org
enabling fingerprint in Homie will work only if host
is set to tiggr.example.org
or broker.example.org
and the correct fingerprint is used; setting host
to the IP address will cause fingerprint verification to fail.
Great! We made it, thanks a lot for your tests, it helped so much. I also added a note on the docs. Note that the Wiki is no more used, because the Wiki was up-to-date with the latest git rev, and so people using an older version were not able to have instructions related to their version. So now docs are embedded into the repo. :wink:
You said you were monitoring for 24h, is it still alright?
I didn't want to bother you with it ... ;-) Yes, it's basically alright. I've had the following reconnects during the past 24h, but much better than it was:
2016-03-29 19:43:47.701872
2016-03-29 21:33:58.774420
2016-03-30 11:43:53.941828
So, I'd say it's fine.
Awesome! Maybe the 2.2.0 release of the Arduino for ESP8266 will fix things even more, but I can't do more in Homie. Can we close?
Fermé! ;-)
As first reported in #44 I'm seeing a disconnect when Homie connects via TLS (this does not occur in plain connections). Screenshot:
Server-side, the MQTT broker identifies an SSL handshake error, even after it sees the initial publishes from the device:
It would appear to me there's something corrupt going over the wire.
After the second connection succeeds, the Homie device carries on happily publishing for hours on end. This behaviour is reproducible.