maximkulkin / esp-homekit

Apple HomeKit accessory server library for ESP-OPEN-RTOS
MIT License
1.1k stars 168 forks source link

PWM Frequency > ~1kHz Causes All Clients to Disconnect #140

Closed mriksman closed 4 years ago

mriksman commented 4 years ago

Hey,

As soon as I enable the ESP8266 RTOS SDK PWM Driver (https://github.com/espressif/ESP8266_RTOS_SDK/blob/master/components/esp8266/driver/pwm.c), then randomly after a few seconds to a few minutes, all clients get disconnected and cannot connect again. Pairing is also impossible to do whilst it is running. The startup code is simply;

    #define PWM_PERIOD 1000
    uint32_t duties = 100;
    uint32_t pin_num = 2;

    pwm_init(PWM_PERIOD, &duties, 1, &pin_num);
    pwm_set_phase(0, 0); 
    pwm_start();

Any thoughts? Probably an issue with interrupts messing up timing...?

mriksman commented 4 years ago

The same issue exists with the multipwm library. I ported it to ESP8266 RTOS SDK, and whilst doing so, I noticed an issue with the multipwm library. I've mentioned it over on RavenSystem's esp-homekit-devices https://github.com/RavenSystem/esp-homekit-devices/issues/902

The multipwm_set_freq() doesn't actually do anything (well, it changes the TIMER_CLKDIV)

The actual frequency/period is set in the header with #define MULTIPWM_MAX_PERIOD UINT16_MAX Which I gather you already know, because you set all your duties based on this value.

In your magic_home_strip example, your multipwm_set_freq(&pwm_info, 65535) will cause the divider to be set to TIMER_CLKDIV_16 (see https://github.com/SuperHouse/esp-open-rtos/blob/master/core/include/esp/timer.h)

So the frequency of the PWM calculates to about 76Hz. You can see the flicker with a Slo-Mo camera (FPS ~120fps). If you set a frequency > 100kHz, the divider will be TIMER_CLKDIV_1 so the PWM frequency will be 1.22kHz. This is assuming the clock is running at 80MHz.

Can you confirm this? If you set the multipwm_set_freq(&pwm_info, 150000), do you get the issue I am describing - client disconnects?

Thanks.

mriksman commented 4 years ago

The issue happens here; https://github.com/maximkulkin/esp-homekit/blob/master/src/server.c#L3015 About 1 minute after starting up the PWM, all clients hit this one after the other and within 10-50 seconds all clients have disconnected. Home App shows 'No Response', and you can't reconnect. Oddly only happening with PWM running... And only when the frequency is high (seems stable at 76Hz which your magic-home example uses)

AramVartanyan commented 4 years ago

Did you try this example: https://github.com/AramVartanyan/esp-homekit-demo/tree/master/examples/magic_home_strip

I need to check if it is the last revision, but it works just fine with one of my Magic Home devices (which is hard to reach to update it). I have shared my version just right after I was able to achieve 1.2 kHz PWM, because 76 Hz is too low and you actually see the flickering. It was very annoying.

I have never had disconnecting issue with it.

mriksman commented 4 years ago

Hmmm, so the issue must be related to ESP8266 RTOS SDK...?

AramVartanyan commented 4 years ago

It is possible. However if you share your code, a resolution could be found easier.

mriksman commented 4 years ago

https://github.com/mriksman/esp-idf-homekit

mriksman commented 4 years ago

Looks like the issue happens before the data_len == 0. Instead of waiting for clients to disconnect after setting the brightness, I instead sat there changing it. Eventually I got No Response - the request never seemed to have reached the ESP8266 (nothing on debug). Shortly after, then all the clients disconnected.

mriksman commented 4 years ago

Some Wireshark captures here. wireshark.zip

In num2.pcapng; I change brightness on my Mac OS at 136, 144, 149, 154, 159, 165, 171, 179, 190, 197, 203, 210 and 216 seconds. Each time, there are 4 packet exchanges; from

Except for the last one. It's missing a packet from ESP8266 - Mac OS. Then, I try changing the brightness at ~220 seconds. No packets are sent, and it shows 'No Response'. Presumably, it is waiting for that extra packet that never arrived from the ESP8266.

maximkulkin commented 4 years ago

Maybe the answer is simple: don't do that.

mriksman commented 4 years ago

Maybe the answer is simple: don't do that.

Don’t do... what?

AramVartanyan commented 4 years ago

You have rewritten all the components and the issue will be very hard to be found. Did you debugged the used libraries step-by-step? (Starting with the “led” example?) Also I would’ve test first with this:

https://github.com/espressif/ESP8266_RTOS_SDK/tree/master/examples/wifi/smart_config

Also you could have issues with the configuration and managing of mDNS.

And one stupid question - did you used this command for cloning ESP8266 RTOS SDK?:

git clone --recursive --branch v3.3-rc1 https://github.com/espressif/ESP8266_RTOS_SDK

The master branch will cause strange issues like the one that you have.

mriksman commented 4 years ago

@AramVartanyan appreciate the time you're taking.

I have rewritten some of the 'unimportant' modules, like button, led_status - but they worked stable before I used PWM. esp-homekit, wolfssl are both original. For PWM I have tried the pwm library from ESP8266 RTOS SDK (which uses 'WDEV TSF0' interrupt) and the ported version (you can see the esp-open-rtos functions commented out and replaced with the ESP8266 RTOS SDK functions) of multipwm which uses FRC1 - but both exhibit the same issue with frequencies set ~1kHz.

I can't imagine mDNS is an issue - I can still perform mDNS queries from the ESP8266 (see button event for '2 clicks' I've used to test), and can still ping the mDNS address and is visible on mDNS browsers. Additionally, it worked stable before PWM.

I am on master branch. I'll try the branch you've suggested tomorrow. I'm not confident it'll fix the issue... (Ouch, I'll have to rewrite the event system to use legacy events; v3.3 doesn't support the new event API).

For some reason, I think the high frequency PWM is causing issues with the underlying connections/sockets...?

mriksman commented 4 years ago

OK, I tried with 3.3-rc1 and then also with NOTHING else in the program - totally stripped back (https://github.com/mriksman/esp-idf-homekit/tree/rtos_v3.3_minimal). I even created a task to set multipwm_set_duty like in the examples.

Same issue.

So there is an issue with ESP8266 RTOS SDK and high frequency interrupts (whether on FRC1or on WDEV TSF0or whatever) with esp-homekit.

If someone else can confirm, I'd appreciate it. I'm really stuck.... I'd be prepared to donate $$ for a solution at this point.

mriksman commented 4 years ago

It's the NodeMCU3 onboard LED's close proximity to the Wi-Fi antenna. I set up the interrupts to fire off very fast but without turning the LED on or off. No issues. I added an external LED, no issues.

maximkulkin commented 4 years ago

Wow, thank you for the update! It’s often hard to troubleshoot issues like that because we do not see the whole picture. Yet it makes sense to accumulate knowledge. Thanks again!

mriksman commented 4 years ago

It was such a last attempt guess! Now on to #141.... :)