Closed mriksman closed 4 years ago
The same issue exists with the multipwm
library. I ported it to ESP8266 RTOS SDK, and whilst doing so, I noticed an issue with the multipwm
library. I've mentioned it over on RavenSystem's esp-homekit-devices
https://github.com/RavenSystem/esp-homekit-devices/issues/902
The multipwm_set_freq()
doesn't actually do anything (well, it changes the TIMER_CLKDIV
)
The actual frequency/period is set in the header with
#define MULTIPWM_MAX_PERIOD UINT16_MAX
Which I gather you already know, because you set all your duties based on this value.
In your magic_home_strip
example, your multipwm_set_freq(&pwm_info, 65535)
will cause the divider to be set to TIMER_CLKDIV_16
(see https://github.com/SuperHouse/esp-open-rtos/blob/master/core/include/esp/timer.h)
So the frequency of the PWM calculates to about 76Hz. You can see the flicker with a Slo-Mo camera (FPS ~120fps). If you set a frequency > 100kHz, the divider will be TIMER_CLKDIV_1
so the PWM frequency will be 1.22kHz.
This is assuming the clock is running at 80MHz.
Can you confirm this? If you set the multipwm_set_freq(&pwm_info, 150000)
, do you get the issue I am describing - client disconnects?
Thanks.
The issue happens here;
https://github.com/maximkulkin/esp-homekit/blob/master/src/server.c#L3015
About 1 minute after starting up the PWM, all clients hit this one after the other and within 10-50 seconds all clients have disconnected. Home App shows 'No Response', and you can't reconnect.
Oddly only happening with PWM running... And only when the frequency is high (seems stable at 76Hz which your magic-home
example uses)
Did you try this example: https://github.com/AramVartanyan/esp-homekit-demo/tree/master/examples/magic_home_strip
I need to check if it is the last revision, but it works just fine with one of my Magic Home devices (which is hard to reach to update it). I have shared my version just right after I was able to achieve 1.2 kHz PWM, because 76 Hz is too low and you actually see the flickering. It was very annoying.
I have never had disconnecting issue with it.
Hmmm, so the issue must be related to ESP8266 RTOS SDK...?
It is possible. However if you share your code, a resolution could be found easier.
Looks like the issue happens before the data_len == 0
.
Instead of waiting for clients to disconnect after setting the brightness, I instead sat there changing it. Eventually I got No Response - the request never seemed to have reached the ESP8266 (nothing on debug). Shortly after, then all the clients disconnected.
Some Wireshark captures here. wireshark.zip
In num2.pcapng; I change brightness on my Mac OS at 136, 144, 149, 154, 159, 165, 171, 179, 190, 197, 203, 210 and 216 seconds. Each time, there are 4 packet exchanges; from
Except for the last one. It's missing a packet from ESP8266 - Mac OS. Then, I try changing the brightness at ~220 seconds. No packets are sent, and it shows 'No Response'. Presumably, it is waiting for that extra packet that never arrived from the ESP8266.
Maybe the answer is simple: don't do that.
Maybe the answer is simple: don't do that.
Don’t do... what?
You have rewritten all the components and the issue will be very hard to be found. Did you debugged the used libraries step-by-step? (Starting with the “led” example?) Also I would’ve test first with this:
https://github.com/espressif/ESP8266_RTOS_SDK/tree/master/examples/wifi/smart_config
Also you could have issues with the configuration and managing of mDNS.
And one stupid question - did you used this command for cloning ESP8266 RTOS SDK?:
git clone --recursive --branch v3.3-rc1 https://github.com/espressif/ESP8266_RTOS_SDK
The master branch will cause strange issues like the one that you have.
@AramVartanyan appreciate the time you're taking.
I have rewritten some of the 'unimportant' modules, like button
, led_status
- but they worked stable before I used PWM. esp-homekit
, wolfssl
are both original.
For PWM I have tried the pwm
library from ESP8266 RTOS SDK (which uses 'WDEV TSF0' interrupt) and the ported version (you can see the esp-open-rtos functions commented out and replaced with the ESP8266 RTOS SDK functions) of multipwm
which uses FRC1
- but both exhibit the same issue with frequencies set ~1kHz.
I can't imagine mDNS is an issue - I can still perform mDNS queries from the ESP8266 (see button event for '2 clicks' I've used to test), and can still ping the mDNS address and is visible on mDNS browsers. Additionally, it worked stable before PWM.
I am on master
branch. I'll try the branch you've suggested tomorrow. I'm not confident it'll fix the issue... (Ouch, I'll have to rewrite the event system to use legacy events; v3.3 doesn't support the new event API).
For some reason, I think the high frequency PWM is causing issues with the underlying connections/sockets...?
OK, I tried with 3.3-rc1
and then also with NOTHING else in the program - totally stripped back (https://github.com/mriksman/esp-idf-homekit/tree/rtos_v3.3_minimal). I even created a task to set multipwm_set_duty
like in the examples.
Same issue.
So there is an issue with ESP8266 RTOS SDK and high frequency interrupts (whether on FRC1
or on WDEV TSF0
or whatever) with esp-homekit.
If someone else can confirm, I'd appreciate it. I'm really stuck.... I'd be prepared to donate $$ for a solution at this point.
It's the NodeMCU3 onboard LED's close proximity to the Wi-Fi antenna. I set up the interrupts to fire off very fast but without turning the LED on or off. No issues. I added an external LED, no issues.
Wow, thank you for the update! It’s often hard to troubleshoot issues like that because we do not see the whole picture. Yet it makes sense to accumulate knowledge. Thanks again!
It was such a last attempt guess! Now on to #141.... :)
Hey,
As soon as I enable the ESP8266 RTOS SDK PWM Driver (https://github.com/espressif/ESP8266_RTOS_SDK/blob/master/components/esp8266/driver/pwm.c), then randomly after a few seconds to a few minutes, all clients get disconnected and cannot connect again. Pairing is also impossible to do whilst it is running. The startup code is simply;
Any thoughts? Probably an issue with interrupts messing up timing...?