nikthefix / Lilygo_Support_T_Display_S3_Long_TFT_eSPI_Volos-nikthefix

16 stars 2 forks source link

Board spontaneously restarts when dimming backlight? #7

Open mcmanigle opened 2 months ago

mcmanigle commented 2 months ago

Admittedly, this has nothing to do with your excellent code (which got me stated on T-Display-Long; thank you!). I've opened the issue on Lilygo's GitHub, but just curious whether you've seen anything like it.

I have a simple application (weather / clock) that just connects to WiFi, periodically updates weather information, and displays it.

Everything works fine and it runs for hours (stable, low RAM use) if I leave the display backlight at full brightness. However, if I dim the backlight using PWM on GPIO1, the board will spontaneously reset after a couple of minutes. Anywhere between 1 minute and 15 minutes before it resets, randomly.

rtc_get_reset_reason shows POWERON_RESET or RTCWDT_BROWN_OUT_RESET. Either one can show, but otherwise the reset pattern looks random as described above.

I looked at the datasheet for PT4103 backlight LED controller, which suggests low-frequency PWM on the backlight control pin, but even using 500Hz, the reset still happens.

Is this something you have experienced or could test?

nikthefix commented 2 months ago

Hi mcmanigle,

I have not experienced this but it sounds like the PWM controller is in some way interacting with the main power supply causing a brownout. How are you powering the board? Battery or USB? If USB then how long is the cable? Is it via a powered hub or directly from the computer? I have had similar issues in the past when PWM dimming discrete leds using the ESP32-S3.

Is there any correlation between PWM frequency, duty cycle and average time to reset? I always have the Long well dimmed as I find it too bright. The calculator demo runs constantly on my desk with no resets.

BTW, I find it useful when developing code to serial monitor the Heap and Ram usage in case of leaks. If depletion causes a crash it can sometimes be misreported by _get_reset_reason.

If it's none of these things then perhaps you could upload a simplified sketch (which demonstrates the problem) having stripped away all the wifi and other functionality leaving only the backlight PWM code. Wifi being the transient power hog is the first thing I disable when troubleshooting power issues.

nik

mcmanigle commented 2 months ago

Ugh, it looks like you're right: a minimal example with only PWM dimming seems not to crash, even though my original sketch with wifi etc only crashes when PWM dimming is enabled.

Have you found any tricks to optimizing wifi power use or other transients? Right now, the sketch updates everything (weather, nursery temperature, sun position, and posting the current millis for debugging / crash detection) every minute in a series of four HTTPClient calls. Maybe breaking these out to only one call every 20 seconds or something would throw less power to the antenna at a time.

When I was trying to trace the point of the crash with a million Serial outputs, it seemed to be inconsistent points in code where the crash would occur. I've mostly been powering the board from computer USB port, but have tried a couple of different (high power) USB power supplies with the same effect. Haven't yet tried putting 5v on the 5v pins (which is my ultimate plan).

The heap is stable, which I check with heap_caps_get_info(&info, MALLOC_CAP_INTERNAL | MALLOC_CAP_8BIT);, but I will double check the RAM checks you have in your hello world sketch as well.

Anyway, thanks for setting me on the right direction realizing it's not just a broken PWM circuit or something. This is why I've always been more of a software guy ;-)

nikthefix commented 2 months ago

@mcmanigle If you have one of those in-line usb power meters you could take a look at power consumption during operation - but I doubt there will be any surprises. A scope might reveal more. A low impedance power path to the board (good quality cables) or testing with a battery (you could have a grounding issue with your USB) would be worth experimenting with. Additionally, an external wifi antenna could improve wifi signal integrity resulting in less need for the esp wifi radio to ramp up gain or demand retries.

Another option is to roll back to an earlier version of Arduino_ESP just in case some gremlins were introduced in the Alpha release.

But I think the first thing to try is a battery. If it works fine then that rules out a whole lot of stuff to do with code.

nik

mcmanigle commented 2 months ago

Thanks; will work on that.

Based on the schematic, am I right in thinking I will need to remove R4 and short R5 on the board to use an external antenna?

nikthefix commented 2 months ago

Yes correct - although I've not tried it myself. For the experiment you could just start by moving the Long and placing it next to your router.

nik

mcmanigle commented 2 months ago

Well, after lots of reading and lots of coding, I switched over to using Espressif's WiFi and HTTP Client APIs. Much lower level than the Arduino versions, and mostly made the switch to play with esp_wifi_set_max_tx_power.

After all of that code refactoring, I didn't really have the patience to tease out exactly which change made the difference, but in the end, spreading out the HTTP calls, lowering the max transmit power, and using a solid power supply allowed the device to stay up (seemingly) indefinitely. I had tried all of these (except transmit power) in isolation before, so I guess it was worth the effort.

Now, of course, the touch chip is getting finicky over time, so I guess that's my next adventure.

nikthefix commented 2 months ago

@mcmanigle Interesting stuff. Yes I agree that using IDF calls in Arduino is often a better approach - the Arduino libs only wrap those anyway - but it's hard to tell if the Arduino libs are keeping pace with changes in the IDF code base without scrutinizing the github documentation. I try to stick with the IDF API as much as possible so that any notified changes relate directly to my code.

Did you try the battery? I'd be interested to see if it made any difference for you.

BTW, since you raised this issue I've heard some other reports of crashes when adding wifi to an existing working Long sketch. I've not been able to reproduce it myself but you've clearly found something which makes a difference in your case so it must merit investigation.

If you could upload a problematic sketch and a non-problematic sketch (doing the same thing) that would be useful for experimentation.

Could you expand on what you mean by 'the touch chip getting finicky over time'? Unresponsive or erroneous or latent or erratic?

Using your new IDF wifi calls, if you disable Wifi does the touch functionality return to working consistently well?

nik

mcmanigle commented 2 months ago

Afraid I never did try the battery. My "known good" (then and now) power supply is an Apple-branded 140W USB-C charger, which lists 5.2V 3A output as its output (ignoring the PD modes). I had tried this power supply before the change in code and it hadn't worked . I did get a USB power meter, and there is some variation in power use around wifi calls, but not huge.

Unfortunately I didn't save versions of the code with crashing behavior... Did just now put the current version on GitHub but I suppose that doesn't help anyone unless they want to change it all back to Arduino calls. (It's also very poorly organized at the moment...)

In terms of the touch chip, I need to do some systematic testing. So far, what I've seen is that after many hours (i.e. leaving it overnight), the chip either doesn't report touches, or sometimes reports low-frequency erroneous touches (e.g. once every second or so). I've really not started exploring this in detail, as I just wanted to get something working for an hour or two before looking at more long-term stability.

I've also probably oversimplified the touch code at the moment, not even checking the interrupt pin. I will focus back on that now that the rest seems to be stable, and let you know what I find. The AXS15231 datasheet mentions a touch sleep mode. So my next goals were to 1) try to pin down a bit better what the problem is, though hard to do when it takes hours to crop up, 2) read the data sheet more closely to figure out what that sleep mode is and whether I need to avoid it, and 3) go back to using the interrupt pin (possibly as an actual interrupt).

Have not tried disabling WiFi and seeing whether that fixes touch, but will put that on my list!

John

nikthefix commented 2 months ago

For touch I'd recommend using the interrupt pin. For a responsive touch UI, polling might not be good enough if wifi and other hungry processes want to steal the show.

nik

nikthefix commented 1 month ago

Hi again John,

How is the display dimming problem? I had a thought. Which demo sketch are you using to test the Long? The reason I ask is because I found an error in my Odometer LVGL sketch which was causing problems with wifi for another developer:

See here:

https://github.com/nikthefix/Lilygo_Support_T_Display_S3_Long_LVGL_Odometer/issues/1

It had nothing to do with dimming but we know how heap issues can propagate. So just in case you're using any code snippets from this demo, please update the .ino file.

Cheers,

nik

mcmanigle commented 1 month ago

So sorry for the delay; was working on other issues related to the project.

I think things are working now. As mentioned before, switching to esp functions for the wifi connection and http requests seems to have fixed the screen dimming issue (whether because of power level adjustment or otherwise, I'm not sure).

In terms of the "eventual screen blanking" issue, that came down to an occasional glitch in the I2C Wire library, causing the while loop to go infinite here.

Because I was originally testing without using the touch interrupt pin, calling the I2C query every single loop would inevitably fall into that trap at some point. I refactored my code with the ESP GPIO interrupt calls (though I'm sure just checking the pin every loop works just as well) and put a timeout on the loop mentioned above. Between those two things, it seems to work just fine, at least testing overnight.