Open koenkooi opened 6 years ago
@CaptainFS the default for the HTTP server is to at least log requests, so if the logs are empty I don't know why they aren't showing up. Perhaps PM2 is configured to now log anything, but you'd have to check that.
@robertklep so I did some testing last week, two things that differ from a 'normal' install: I run on a Raspberry Pi3 (v1) and I reset the Nefit Easy every night. I do that by disconnecting the Nefit Easy momentarily from the Boiler with a relais. The reason was my Nefit disconnected permanently from the WiFi every few days. Since we have Nefit Easy version 02.19.01, that should not be necessary anymore.
Could that (have) be (/been) an issue, Robert?
@CaptainFS yes and no.
From what I can tell, the issue most likely occurs when the Easy goes offline. However, it shouldn't cause the server to stop working; instead, the server should just wait until the Easy is back online again, and continue working.
I don't have any issues with my Easy (as far as I can tell, at least), so it hardly ever goes offline. So that's why it might have been difficult to reproduce the exact problems.
Found the time to check things out: had drop outs for the last days.
nefit-easy-core connection start error { Error: getaddrinfo EAI_AGAIN wa2-mz36-qrmzh6.bosch.de:5222
at Object._errnoException (util.js:992:11)
at errnoException (dns.js:55:15)
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:92:26)
code: 'EAI_AGAIN',
errno: 'EAI_AGAIN',
syscall: 'getaddrinfo',
hostname: 'wa2-mz36-qrmzh6.bosch.de',
host: 'wa2-mz36-qrmzh6.bosch.de',
port: 5222 } +0ms
{ Error: getaddrinfo EAI_AGAIN wa2-mz36-qrmzh6.bosch.de:5222
at Object._errnoException (util.js:992:11)
at errnoException (dns.js:55:15)
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:92:26)
code: 'EAI_AGAIN',
errno: 'EAI_AGAIN',
syscall: 'getaddrinfo',
hostname: 'wa2-mz36-qrmzh6.bosch.de',
host: 'wa2-mz36-qrmzh6.bosch.de',
port: 5222 }
Then it quits completely.
When the process is starting and it cannot create an initial connection ("connection start error" in your error log), the default is to throw an error instead of retrying.
That might not be the correct default for all types of applications: for instance, a long-running process like the HTTP server should probably just keep trying to connect.
I'll look at a fix (probably next week).
The KPN-cable guy is at work at this moment, bummer. I will try to catch more when all is working again.
Thanks @robertklep !!! I think (?) I found the issue: decided to stop PM2 and run it in a Tmux session, just quick and dirty, with the debug. Guess what: it's still running. So it might be a total different issue: PM2??? Currently my internet is very bad (traced it to an issue in the DSLAM, different story) and guess what: Nefit HTTP server is still running and successfully reconnecting for 6 days in a row.
Thanks for the update, @CaptainFS! I've been running an app that maintains a permanent connection to the Easy backend for weeks now, and it seems to reconnect successfully every time.
I think it might be time to do a new release. At least the current (test) version works better than the previous version.
Unfortunately after running the updated version for about 6 weeks now I frequently have to stop/start the server. The logging show MAX_RETRIES etc. and after that still shows the request lines, but with aabout 5 seconds spacing in between (The timeout set). No numbers are received. When stop/starting the server everything is fine again. I get this from the nohup.out file, and wonder if there is more logging that I cannot find that shows a little more information? (I am running this in a synology). In the attached logging it shows the slow response to the request made to the server. So it probably is still connected (otherwise it would reconnect I guess), but very slow in its response. I wonder if I cut out the webserver and use single commands if that would give more stable results. Unfortunately I am not so saffy as to know how to call that from a php page. Attached some logging: 127.0.0.1 - - [01/Aug/2018:02:28:04 +0000] "GET /bridge/system/sensors/temperatures/outdoor_t1 HTTP/1.1" 200 194 "-" "-" 127.0.0.1 - - [01/Aug/2018:02:28:06 +0000] "GET /bridge/ecus/rrc/uiStatus HTTP/1.1" 200 495 "-" "-" 127.0.0.1 - - [01/Aug/2018:02:28:07 +0000] "GET /bridge//heatingCircuits/hc1/actualSupplyTemperature HTTP/1.1" 200 161 "-" "-" 127.0.0.1 - - [01/Aug/2018:02:28:07 +0000] "GET /bridge//system/appliance/systemPressure HTTP/1.1" 200 151 "-" "-" 127.0.0.1 - - [01/Aug/2018:02:29:09 +0000] "GET /bridge/system/sensors/temperatures/outdoor_t1 HTTP/1.1" - - "-" "-" 127.0.0.1 - - [01/Aug/2018:02:29:14 +0000] "GET /bridge/ecus/rrc/uiStatus HTTP/1.1" - - "-" "-" 127.0.0.1 - - [01/Aug/2018:02:29:20 +0000] "GET /bridge//heatingCircuits/hc1/actualSupplyTemperature HTTP/1.1" - - "-" "-" 127.0.0.1 - - [01/Aug/2018:02:29:25 +0000] "GET /bridge//system/appliance/systemPressure HTTP/1.1" - - "-" "-" 127.0.0.1 - - [01/Aug/2018:02:30:11 +0000] "GET /bridge/system/sensors/temperatures/outdoor_t1 HTTP/1.1" - - "-" "-" 127.0.0.1 - - [01/Aug/2018:02:30:16 +0000] "GET /bridge/ecus/rrc/uiStatus HTTP/1.1" - - "-" "-" 127.0.0.1 - - [01/Aug/2018:02:30:22 +0000] "GET /bridge//heatingCircuits/hc1/actualSupplyTemperature HTTP/1.1" - - "-" "-" 127.0.0.1 - - [01/Aug/2018:02:30:27 +0000] "GET /bridge//system/appliance/systemPressure HTTP/1.1" - - "-" "-" Error: MAX_RETRIES_REACHED at Error (native) at queueMessage.catch.e (/volume1/@appstore/Node.js_v6/usr/local/lib/node_modules/nefit-easy-http-server/node_modules/nefit-easy-core/lib/index.js:245:51) Error: MAX_RETRIES_REACHED at Error (native) at queueMessage.catch.e (/volume1/@appstore/Node.js_v6/usr/local/lib/node_modules/nefit-easy-http-server/node_modules/nefit-easy-core/lib/index.js:245:51) 127.0.0.1 - - [01/Aug/2018:02:31:09 +0000] "GET /bridge/system/sensors/temperatures/outdoor_t1 HTTP/1.1" - - "-" "-" 127.0.0.1 - - [01/Aug/2018:02:31:14 +0000] "GET /bridge/ecus/rrc/uiStatus HTTP/1.1" - - "-" "-" 127.0.0.1 - - [01/Aug/2018:02:31:20 +0000] "GET /bridge//heatingCircuits/hc1/actualSupplyTemperature HTTP/1.1" - - "-" "-" 127.0.0.1 - - [01/Aug/2018:02:31:25 +0000] "GET /bridge//system/appliance/systemPressure HTTP/1.1" - - "-" "-" 127.0.0.1 - - [01/Aug/2018:02:32:11 +0000] "GET /bridge/system/sensors/temperatures/outdoor_t1 HTTP/1.1" - - "-" "-" 127.0.0.1 - - [01/Aug/2018:02:32:16 +0000] "GET /bridge/ecus/rrc/uiStatus HTTP/1.1" - - "-" "-" 127.0.0.1 - - [01/Aug/2018:02:32:22 +0000] "GET /bridge//heatingCircuits/hc1/actualSupplyTemperature HTTP/1.1" - - "-" "-" 127.0.0.1 - - [01/Aug/2018:02:32:27 +0000] "GET /bridge//system/appliance/systemPressure HTTP/1.1" - - "-" "-" 127.0.0.1 - - [01/Aug/2018:02:33:11 +0000] "GET /bridge/system/sensors/temperatures/outdoor_t1 HTTP/1.1" - - "-" "-" 127.0.0.1 - - [01/Aug/2018:02:33:17 +0000] "GET /bridge/ecus/rrc/uiStatus HTTP/1.1" - - "-" "-" 127.0.0.1 - - [01/Aug/2018:02:33:22 +0000] "GET /bridge//heatingCircuits/hc1/actualSupplyTemperature HTTP/1.1" - - "-" "-" 127.0.0.1 - - [01/Aug/2018:02:33:28 +0000] "GET /bridge//system/appliance/systemPressure HTTP/1.1" - - "-" "-"
@PietSmits it might be worthwhile to investigate why requests are timing out. The "max retries" means that the server has tried for 30 seconds to send a request to the Easy, but it never responded. As you state correctly, the server thinks it's still connected.
unfortunately it stopped again:
127.0.0.1 - - [03/Aug/2018:12:41:18 +0000] "GET /bridge/system/appliance/displaycode HTTP/1.1" 200 101 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/601.38 (KHTML, like Gec$
2018-08-03T12:41:42.896Z nefit-easy-core sending ping
2018-08-03T12:41:48.276Z nefit-easy-core preparing message: /ecus/rrc/uiStatus (retries = 0)
2018-08-03T12:41:48.276Z nefit-easy-core queuing request (retries = 0)
2018-08-03T12:41:48.277Z nefit-easy-core sending message
127.0.0.1 - - [03/Aug/2018:12:41:49 +0000] "GET /bridge/ecus/rrc/uiStatus HTTP/1.1" - - "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/601.38 (KHTML, like Gecko) Chrome/55.0$
2018-08-03T12:41:50.285Z nefit-easy-core error sending message Error: REQUEST_TIMEOUT
at Timeout.setTimeout (/usr/lib/node_modules/nefit-easy-http-server/node_modules/nefit-easy-commands/node_modules/nefit-easy-core/lib/index.js:175:34)
at ontimeou2018-08-10T08:33:12.121Z nefit-easy-core sending ping
at tryOnTimeout (timers.js:323:5)
at Timer.listOnTimeout (timers.js:290:5)
2018-08-03T12:41:50.285Z nefit-easy-core message timed out, retrying...
2018-08-03T12:41:50.285Z nefit-easy-core queuing request (retries = 1)
2018-08-03T12:41:50.286Z nefit-easy-core sending message
2018-08-03T12:41:52.293Z nefit-easy-core error sending message Error: REQUEST_TIMEOUT
at Timeout.setTimeout (/usr/lib/node_modules/nefit-easy-http-server/node_modules/nefit-easy-commands/node_modules/nefit-easy-core/lib/index.js:175:34)
at ontimeout (timers.js:498:11)
at tryOnTimeout (timers.js:323:5)
at Timer.listOnTimeout (timers.js:290:5)
2018-08-03T12:41:52.294Z nefit-easy-core message timed out, retrying...
2018-08-03T12:41:52.294Z nefit-easy-core queuing request (retries = 2)
2018-08-03T12:41:52.294Z nefit-easy-core sending message
2018-08-03T12:41:54.299Z nefit-easy-core error sending message Error: REQUEST_TIMEOUT
maybe I'm in the wrong post here, sorry about that :), the above repeats itself endlessly.
@CaptainFS try leaving it running for a while (like a few hours).
It ran for 7 days like this.
Ugh, that's not good. Hopefully I'll have some time next week to try and figure out this problem once and for all.
The 'old' code ran for (a) year(s) without problems, something fishy going on here. I run a Rasp-Pi3 with Domoticz if that helps and since a month reinstalled with the latest and greatest Raspbian Stretch. I'm thinking about 'detecting' the timeout myself somewhere and check with a cron, then restart the easy-server. Restarting easy-server always helps for a day or two/three.
The old code simply doesn't work anymore, because of Nefit backend changes. I'd rather not have had to use a different XMPP library, because I'm not very happy with the one that's being used now, but for now, there is no alternative.
@vanherelj @koenkooi @CaptainFS a new day, a new debug version.
I backported the new authentication mechanism (SCRAM-SHA-1) to the old XMPP client, in the hope that this will solve most of the issues.
I pushed a test version again for you to try:
npm uninstall nefit-easy-core
npm uninstall nefit-easy-commands
npm uninstall nefit-easy-http-server
npm install robertklep/nefit-easy-http-server#debug-test
During startup, the server should display [old-client-with-scram]
.
Installed!
Aug 11 12:43:14 solar easy-server[5298]: [old-client-with-scram]
Aug 11 12:43:20 solar easy-server[5298]: Sat, 11 Aug 2018 10:43:20 GMT nefit-easy-core online,
Thanks @robertklep ! I’m facing an issue installing with NPM: error 128 when installing.
@CaptainFS perhaps you need to prefix the commands with sudo
. If not, there should be a reason given why the installation failed. See also this previous comment.
I was logged in as Root, still with the Sudo command it works, I kinda feel like an idiot, maybe!? Thanks for your dedication to this project @robertklep !
Running with: [old-client-with-scram] now.
So far, so good:
root@solar:~# systemctl status easy
● easy.service - Nefit Easy http bridge
Loaded: loaded (/etc/systemd/system/easy.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2018-08-11 12:43:08 CEST; 3 days ago
Main PID: 5298 (node)
Tasks: 9 (limit: 512)
CGroup: /system.slice/easy.service
└─5298 node /usr/bin/easy-server --serial= --access-key= --password= --host=0.0.0.0
Aug 15 09:18:09 solar easy-server[5298]: Wed, 15 Aug 2018 07:18:09 GMT nefit-easy-core sending message
Aug 15 09:18:09 solar easy-server[5298]: Wed, 15 Aug 2018 07:18:09 GMT nefit-easy-core received stanza of type "message"
Aug 15 09:18:09 solar easy-server[5298]: Wed, 15 Aug 2018 07:18:09 GMT nefit-easy-core cleaning up for /heatingCircuits/hc1/actualSupplyTemperature
Aug 15 09:18:09 solar easy-server[5298]: 172.20.0.203 - - [15/Aug/2018:07:18:09 +0000] "GET /bridge/heatingCircuits/hc1/actualSupplyTemperature HTTP/1.1" 200 163 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/603.36 (KHTML, like Ge
Aug 15 09:18:09 solar easy-server[5298]: Wed, 15 Aug 2018 07:18:09 GMT nefit-easy-core preparing message: /system/appliance/displaycode (retries = 0)
Aug 15 09:18:09 solar easy-server[5298]: Wed, 15 Aug 2018 07:18:09 GMT nefit-easy-core queuing request (retries = 0)
Aug 15 09:18:09 solar easy-server[5298]: Wed, 15 Aug 2018 07:18:09 GMT nefit-easy-core sending message
Aug 15 09:18:09 solar easy-server[5298]: Wed, 15 Aug 2018 07:18:09 GMT nefit-easy-core received stanza of type "message"
Aug 15 09:18:09 solar easy-server[5298]: Wed, 15 Aug 2018 07:18:09 GMT nefit-easy-core cleaning up for /system/appliance/displaycode
Aug 15 09:18:09 solar easy-server[5298]: 172.20.0.203 - - [15/Aug/2018:07:18:09 +0000] "GET /bridge/system/appliance/displaycode HTTP/1.1" 200 101 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/603.36 (KHTML, like Gecko) Chrome/53.
@koenkooi thanks for the update! Can you check and see in the log if there have been situations where the connection was lost and got picked up again?
No errors whatsoever in the log.
I got the impression that after the backend update, it became a lot more unstable. However, I'm now wondering if that unstability was in fact caused by the different XMPP client and not so much the backend.
I'd prefer to do a decent enough testrun (like a week or so) before I declare this version fit for publishing.
Still running fine here, I'll keep my fingers crossed.
Some simple testing seems to suggest that the client picks up a broken or slow connection eventually, but to be honest, I already ran similar tests before (using the newer client) and that always worked for me too.
Last time I looked at the 'old' version it had stopped for more than 5 days after running for 1-2 days, it tried to reconnect (I believe), but never succeeded in doing so. I always had to restart it manually to keep it going.
Your last version still runs! And I guarantee you my internet connection is very bad at the moment :-) Seems the telco have to replace the whole cable outside the house.
@koenkooi any problems on your side yet? If not, I'm going to declare this version as ready for publishing :D
So far no issues
Op 17 aug. 2018 om 13:30 heeft Robert Klep notifications@github.com het volgende geschreven:
@koenkooi any problems on your side yet? If not, I'm going to declare this version as ready for publishing :D
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Okay folks, thanks for hanging in there!
I published a new version of nefit-easy-http-server
, as a beta for now. Let me know if you run into any issue. If not, I'll publish it as "latest" (the NPM/Node equivalent of "stable"):
npm uninstall nefit-easy-core
npm uninstall nefit-easy-commands
npm uninstall nefit-easy-http-server
npm install nefit-easy-http-server@beta
(optionally with sudo
and -g
for each command, if appropriate)
Thanks for your continues effort Robert! It's working great now!
I just released nefit-easy-http-server@5.0.1
, which is the latest stable version and code-wise the same as the beta.
Hopefully we can finally close this issue 😅
I'm getting 5.0.2, but i assume this is also ok?
@sweetpants yeah everything from 5.0.0 upwards is okay :)
I installed the new version when I returned from holiday on 24th of August, and it seems a lot more stable. I have had only a few errors messages returned (still MAX_RETRIES though). Thanks for working on this Robert!
Thanks for the feedback, @PietSmits!
Errors can still happen, because the Nefit backend goes offline every now and then, but the connection should be picked up automatically again once it's back up.
Using npm update nefit-easy-http-server -g the easy-server is updated to 4.1.0 Why doesn't it update to 5.x.x?
Not sure, perhaps npm update
will refuse to update to a new major version.
Try re-installing it:
npm install nefit-easy-http-server@latest -g
To be sure: no npm uninstall needed first? Just npm install?
You'll probably have the least chance of issues when you use this:
npm uninstall nefit-easy-core -g
npm uninstall nefit-easy-commands -g
npm uninstall nefit-easy-http-server -g
npm install nefit-easy-http-server@latest -g
thanks
After updating to get SCRAM-SHA1 and updating node to 8.9.4 I'm getting errors like the one below:
Restarting the server makes it work for about a day: