lmarzen / esp32-weather-epd

A low-power E-Paper weather display powered by an ESP32 microcontroller. Utilizes the OpenWeatherMap API.
GNU General Public License v3.0
2.33k stars 179 forks source link

Random API Connection Failures #83

Open FrAllard opened 4 months ago

FrAllard commented 4 months ago

I know Issues on Github aren't made for this, I couldn't find a better place for it.

I wanted to share the result of the base I design for 3D printing for the project you shared.

It's not perfect yet, I have one more station to print and assemble where I made a little modification to the aligment of the screen with the DESPI module. When I confirm that the screen line up perfectly with the DESPI module I'll upload my 3D files to printables.com

I did not install a reset button and I wonder if i'll regret it in the future... I can still modify and print an other base though if I decide to add one. There is a pin hole on the side to hit the onboard reset though in case I really need to hit reset, but it's not as easy as hitting a big button.

I did find out during tests on my bench, disassembled, that the API sometimes fail to connect, I'd rather have the code not tell me or retry 2 ou 3 times before telling me. I found myself pressing the reset only when there was an error shown on the screen.

Printed with Overture ROCK PLA Rock White on a BambuLab P1S 0.4mm nozzle at 0.2mm layer height.

20240227_230316 20240227_230252 20240227_230136 20240227_230147 X-Ray

lmarzen commented 4 months ago

@FrAllard,

Thank you for the kind words and constructive feedback.

Secondly, I want to say, wow that is a great-looking base! Probably the most well-thought-out one I have seen and the bottom panel is a nice touch. I would be happy to link to it if/when you share it on printables (feel free to link it as a reply here or open a pull request). Your build looks so clean, great job.

Lastly, I appreciate the constructive feedback. I have begun experiencing the same API connection errors more and more frequently in the last month. There does currently exist a retry mechanism in the software. Connection is attempted 3 times before the error is displayed. This used to seem nearly 100% effective at preventing these errors, however, it no longer seems to be doing the trick. I suspect that adding additional delay between retries may resolve this issue. Regardless, this is something that I plan to look into and fix (anticipate sometime mid-March as I have midterm exams this week and next week). I'll tag this thread when a fix is pushed.

Regards,

Luke

lmarzen commented 4 months ago

I have experimented with adding some delay, which seems to have fixed it, though it is hard to tell due to the random nature of the issue. I'll wait a few more days and if I still don't see the error again then I'll push the fix.

FrAllard commented 4 months ago

Great thank you!

I too don't see the error that much since the thing has been assembled and placed in the living room instead of being partly assembled on the worktop bench where I work all day when I'm working from home!

I'll program a second one soon. I'll test the new modifications you've done!

FrAllard commented 4 months ago

I posted my project featuring your's at this address. Feel free to share it. https://www.printables.com/model/791477-weather-station-using-a-esp32

Marckau commented 4 months ago

Thank you!MarcoSent from my iPhoneOn Mar 3, 2024, at 5:41 PM, Francois Allard @.***> wrote: I posted my project featuring your's at this address. Feel free to share it. https://www.printables.com/model/791477-weather-station-using-a-esp32

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

lmarzen commented 4 months ago

Wow, great instructions! I love all the pictures, you clearly put in a lot of effort. I added a link to it in the project readme. Thanks for sharing.

FrAllard commented 4 months ago

I did put a lot of effort, but for me this is my "painting". I find this relaxing to 3D design this kind of stuff... You also did put a lot of effort in your firmware and follow up on issue.

Long live open source and open hardware United we stand, divided we fall

lmarzen commented 3 months ago

Update about occasional -1 Connection Refused HTTPC errors. Due to the unpredictable nature for the issue, it has been challenging to debug. However, I think I am on to a solution that involves adding a slight delay before retrying the API call. I have experimented with delays as short as 50ms which did not fix the issue, and delays as long as 1s which seemed to fix the issue since after two weeks time using this delay I did not observe the Connection Refused error. I am currently experimenting with 200ms delay which appears to be sufficient to prevent these errors. I am going to wait another week, and if I don't see the error again I will push this as the fix.

lmarzen commented 2 months ago

I have begun observing -1 Connection Refused errors again. Delay doesn't seem to fix it. Progress update though, I managed to capture the error message over the serial monitor. Will continue to investigate.

[  9912][E][WiFiClientSecure.cpp:144] connect(): start_ssl_client: -1
  -1 Connection Refused
LiHuihhh commented 2 months ago

@lmarzen Well, I found this amazing project in Github, and I made it.However, I met the same problem,so I was looking for the resolutions for hours. And I failed. I found this conversation, but I still didn't find it out. It really troubles me for two days at least as I want to make it as a special birthday present for my best friend. It usually errors "-1 connection refused " or "-11 Read timeout". I tried a lot, for instance, I set the NTP_TIMEOUT at least 600000 ms. But it still errors.It seems that these mistakes are unpreventable and unpredictable, it definitly worries me a lot.I'm still waiting for your official resolution.But I'm worried that I won't give the present for her birthday, which absulutely is a pity.

LiHuihhh commented 2 months ago

Looking for your response as soon as possible,only one day for me to improve the machine.

LiHuihhh commented 2 months ago

Sometimes the mistake is also "-258 Deserialization Incomplete Input"

LiHuihhh commented 2 months ago

屏幕截图 2024-05-04 214333 屏幕截图 2024-05-04 214404

LiHuihhh commented 2 months ago

Maybe Wifi in my house is not good as well.

lmarzen commented 2 months ago

Okay, Let's try to get this figured out asap.

NTP_TIMEOUT is only for syncing the time, increasing this timeout will not fix your issue.

I think we can increase the timeout for http requests and that should help. I have had a tremendously difficult time debugging these issues in the past since I cannot reproduce them reliably.

If you are able to capture terminal outputs for any of these errors that would be immensely helpful.

There is another work around that I can implement which I call 'silent retries'. The idea is that for certain types of errors like API errors we shouldn't display them the first time, we should just wait a minute and start over and hope the second or third time the error resolves itself. So we would only display API errors if it happens several times in a row.

Can you estimate how frequently you see these errors? Is it almost everytime? once an hour? once a day? etc?

LiHuihhh commented 2 months ago

I agree with you. Once I came up with a resolution that if api errors the screen shouldn't be refreshed, just waiting for the next time it resets. Maybe this is a kind of coincidence. To be honest, I'm not really good at coding,so I may make some wrong steps.I think if we can increase the timeout for http requests , it may be solved , at least reducing the possibilities that the errors happen. You can try it. I hope to receive the update.

LiHuihhh commented 2 months ago

Haha, maybe it will be a complexed problem for you.

LiHuihhh commented 2 months ago

IMG_8329 屏幕截图 2024-05-04 222211 It errors for about 7 times in 10,as I set the sleep duration "3"

LiHuihhh commented 2 months ago

Maybe you can start with the resolution ,increasing the timeout for http requests, as a user-defined setting. Maybe the server is too far from my area to receive the datas on time.

LiHuihhh commented 2 months ago

![Uploading IMG_8330.jpg…]()

lmarzen commented 2 months ago

Okay, I think I am finally figuring it out. Your error messages helped me along. I'm working on it now. I'll get back soon.

lmarzen commented 2 months ago

I have managed to reproduce -11: Read Timeout for the first time. I think I have finally figured this one out.

lmarzen commented 2 months ago

I never thought it was a timeout issue since I only ever witnessed the Connection Refused error. By decreasing the timeout to 100ms I was able to reproduce the read timeout error. The default http tcp timeout is 5000ms. I have increased this for this project to 10000ms. You can increase this further if you need in config.cpp. Let me know if this resolves your issue and if you needed to increase the timeout further.

// HTTP
// The following errors are likely the result of insuffient http client tcp 
// timeout:
//   -1   Connection Refused
//   -11  Read Timeout
//   -258 Deserialization Incomplete Input
const unsigned HTTP_CLIENT_TCP_TIMEOUT = 10000; // ms
LiHuihhh commented 2 months ago

Thank you so much! I'll test it after I wake up. Well,there is still an error"connection refused", which I couldn't understand.So if you can ,you can implement 'silent retries',and provide an extra option that users define the times the system retries.So maybe the system will be more flexible and user-defined.On the other hand,you will receive less feedbacks, just updating, rather than debugging a variety of errors.Good luck!

LiHuihhh commented 1 month ago

Well,I want to consult about the problem that if I enter another Wifi which is far from my house,it seems that my friend couldn't make it run well via touching the reset bottom.It still doesn't work again.I think that if the first running didn't succeed, it won't run again , as a circulation.Since I couldn't get the displaying conveniently,it may be hard for me to inspect it. I hardly go to my friend's home. Could you please give me some advice?

lmarzen commented 1 month ago

Can you please clarify the problem? It is unclear to me what the issue is that you are experiencing. Did the latest updates from e41f6fa fix your issue with the API failures?

RemindZ commented 1 month ago

@lmarzen

first off, thanks for your project, this makes for some stunning home accessories.

Unfortunately since yesterday I am running into issues with mine, for a couple days this was running completely fine, refreshing every ten minutes, however, since yesterday, it shows -517: Connection Lost. It still does refresh every 10 minutes, but unfortunately no luck.

I've re-flashed with different settings (namely, I upped the HTTP timeout and WiFi timeout a bit) but also no luck with that.

Nothing about my network setup has changed in the last weeks.

I can actually also see this issue in the API usage statistics for the one call API:

Date (UTC) | Total calls May 20, 2024 | 6 May 19, 2024 | 150 May 18, 2024 | 156 20240520_193305

lmarzen commented 1 month ago

The "Connection Lost" error occurs when a WiFi connection is made successfully but is lost before the API requests can be made. Does this error go away if you move closer to your access point?

One potential solution (would require code modifications) could be to attempt a WiFi reconnect if WiFi status is not good when trying to make API requests.

For reference, from the arduino docs:

WL_CONNECTION_LOST: assigned when the connection is lost