khoih-prog / AsyncWebServer_RP2040W

Asynchronous WebServer Library for RASPBERRY_PI_PICO_W using CYW43439 WiFi with arduino-pico core. This library, which is relied on AsyncTCP_RP2040W, is part of a series of advanced Async libraries for RP2040W, such as AsyncTCP_RP2040W, AsyncUDP_RP2040W, AsyncWebServer_RP2040W, AsyncHTTPRequest_RP2040W, AsyncHTTPSRequest_RP2040W, etc. Now can display programmed WiFi country-code and support using CString to save heap to send very large data
GNU Lesser General Public License v3.0
23 stars 6 forks source link

Target stops responding after variable time when using Firefox on Windows 10 #3

Closed revell1 closed 1 year ago

revell1 commented 1 year ago

Describe the bug

While testing the Async_AdvancedWebServer.ino example, it has been seen that while accessing the target using Firefox 104.0.2 (64-bit) from a Windows 10 laptop, that the program responds every 5 seconds with an update. But after some random time, the last request fails to get a reply.

The SERIAL debug output shows that the heartbeat DOTS are still occuring.

But trying to refresh the browser or connect to the target using a different browser (EGDE), or trying to attach using Firefox from an Android device all fail to connect.

If first connect to target using EDGE rather than Firefox, then have not yet seen target stop responding. Have also had THREE connections (Firefox + Edge on PC, and Firefox on Android) at same time, for over 20 minutes, without issue.

Problem only appears to occur when using Firefox as first/only connection from power up. Unclear if this is a Firefox induced issue, a target program issue, or a network/router issue.

Steps to Reproduce

As above, connect to the target address (just the IP, no HTTP:// or HTTPS://) using Firefox, and watch the regular updates, wait till the browser reports

" The connection has timed out The server at 192.168.1.xxx is taking too long to respond. "

Expected behavior

Would expect the target device to continue returning replies without issue.

Actual behavior

After random time, target stops responding, but heartbeat debug output shows still alive.

Debug and AT-command log (if applicable)

N/a

Screenshots

N/a

Information

Arduino IDE version 1.8.19
RP2040 Core Version RP2040 core v2.5.2
RP2040 Board type RASPBERRY_PI_PICO_W
Contextual information - Just trying out the library to see if it fits with my needs.
Simplest possible steps to reproduce - build and run Async_AdvancedWebServer example program.
Anything that might be relevant in your opinion, such as:
    Operating system code built on Windows 10 laptop
    Network configuration - Firefox 104.0.2 (64-bit) browsing to Pico-W IP address.
khoih-prog commented 1 year ago

Hi @revell1

Thanks for the detailed investigation.

If first connect to target using EDGE rather than Firefox, then have not yet seen target stop responding. ... Unclear if this is a Firefox induced issue, a target program issue, or a network/router issue.

As you've already tested found out, there must be something weird with the combination of Firefox 104.0.2 (64-bit) from a Windows 10 laptop and your network ???

Try to use different browsers, such as Chrome, Vivaldi, or even different Firefox versions / router to see if OK.

If the issue happens only when using Firefox, you can post the issue on Firefox Forum as the issue is possibly so complex that I don't have interests (and never use Firefox) as well as know where or how to start.

I'm closing the issue anyway, and won't reopen until the bug of this library is proven.

Good Luck with your investigation,

revell1 commented 1 year ago

I have been repeating tests using Microsoft Edge, and the Async_AdvancedWebServer.ino example code, with: _RP2040W_AWSLOGLEVEL 1

I have encountered similar issues as before.

Below is the serial/USB log output of the heartbeat output, while EDGE was left with single TAB opened on target address 192.168.1.105

Page was updating for about 37 minutes before finally stalling and being unrecoverable.

What was seen on browser was the full webpage with the UPTIME (37:02 was the final time) and random data SVG graphs.

But at times when the page was updating, the GRAPH was replaced with a SMALL ICON, this appeared to line up with the error output in the serial log where the setCloseError() is displayed.

A text comment in the code (in check_status() function) suggests that the "." logging is updated every 60 seconds, but I think that is an error and should be 10seconds, in which case each dot in the log below is a 10 second interval. From the log, it does not look like a regular error/event/fault, but is quite frequent.

I am guessing that this effect is what I was seeing with Firefox, but with Firefox it appeared to be fatal, and once the error occured, Firefox was unable to recover or even reconnect. EDGE however appears to manage to recover on most occasions, though the end of this log trace was when even EDGE was unable to reconnect.

I am not sure where the "[ATCP] setCloseError()" output is comming from, but did find a similar log report from one of your other projects in the "readme" file for:

github.com/khoih-prog/AsyncMQTT_Generic#debug-terminal-output-samples

I am still unclear where the issue is, but appears that both Firefox and Edge are being affected in different ways.

Suggesting that the problem is either WiFi driver or program code, or some Wireless router issue that has been observed by other users with other libraries.

The Serial (USB) Log Output

Local IP Address: 192.168.1.105 HTTP EthernetWebServer is @ IP : 192.168.1.105 ....[ATCP] setCloseError() to: Connection closed => -15 ...... [ATCP] setCloseError() to: Connection closed => -15 ..[ATCP] setCloseError() to: Connection closed => -15 .[ATCP] setCloseError() to: Connection closed => -15 ....... [ATCP] setCloseError() to: Connection closed => -15 [ATCP] setCloseError() to: Connection closed => -15 ..[ATCP] setCloseError() to: Connection closed => -15 ........ [ATCP] setCloseError() to: Connection closed => -15 .......... .........[ATCP] setCloseError() to: Connection closed => -15 . ....[ATCP] setCloseError() to: Connection closed => -15 ...... [ATCP] setCloseError() to: Connection closed => -15 .....[ATCP] setCloseError() to: Connection closed => -15 ..... .......... .[ATCP] setCloseError() to: Connection closed => -15 ....[ATCP] setCloseError() to: Connection closed => -15 ..... .[ATCP] setCloseError() to: Connection closed => -15 ......... ....[ATCP] setCloseError() to: Connection closed => -15 ...... .....[ATCP] setCloseError() to: Connection closed => -15 ..... ...[ATCP] setCloseError() to: Connection closed => -15 ...[ATCP] setCloseError() to: Connection closed => -15 .... ..[ATCP] setCloseError() to: Connection closed => -15 ......[ATCP] setCloseError() to: Connection closed => -15 .. ..[ATCP] setCloseError() to: Connection closed => -15 .....[ATCP] setCloseError() to: Connection closed => -15 ... ..[ATCP] setCloseError() to: Connection closed => -15 ..[ATCP] setCloseError() to: Connection closed => -15 ...... .[ATCP] setCloseError() to: Connection closed => -15 ....[ATCP] setCloseError() to: Connection closed => -15 ..... ....[ATCP] setCloseError() to: Connection closed => -15 ...... .........[ATCP] setCloseError() to: Connection closed => -15 . .........[ATCP] setCloseError() to: Connection closed => -15 . .....[ATCP] setCloseError() to: Connection closed => -15 .[ATCP] setCloseError() to: Connection closed => -15 .... ..[ATCP] setCloseError() to: Connection closed => -15 ........ .......... .......... ....

khoih-prog commented 1 year ago

HI @revell1

There can be some problem with your board, network, etc. as I have been running the same example more than 4 and 1/2 hours and still running. My browser is latest Chrome running on Ubuntu 20.04 LTS

Selection_047


7++ hrs and still running

Selection_048

khoih-prog commented 1 year ago

14++ hrs and still running

Selection_049

khoih-prog commented 1 year ago

Even on Vivaldi browser

Selection_050

khoih-prog commented 1 year ago

Both Chrome and Vivaldi running simultaneously

Selection_051

revell1 commented 1 year ago

Thanks for your further testing.

I have been trying to do further testing, but still get the frequent/random failures with no clue why.

As an further experiment, I took a side step to the example in: Webserver - AdvancedWebServer From the "earlephilhower" code port.

The Sketch code looks similar to yours, just using the NON Async server, so I was assuming that much of the underlying network driver code would be the same.

This also is having the same stability issues. Have also tried a NEW Pico-W with no improvements.

Also tried various other test code with not real conclusions.

I have a number of Pi Zero-W and PI 3B's running Volumio, all connecting to 2.4GHz network, and have been fine for several years, can always connect to them from Laptop or Android on 5GHz, with router providing the link between 2.4 and 5GHz networks, but they use different hardware etc, so not really a useful comparison.

So I am still at a loss on how to track down what is causing the random issues. Could be Wifi hardware issue with Pico-W on my 2.4GHz network that causes problems. But even when Firefox is unable to connect to the server, I can still PING the target Pico-W, so the Wifi link is still there, but something has blocked any HTTP type access from either my PC or any other devices (Android).

Powercycle of the Pico always clears the block, but still does not help to track the source.

=====

As a side note, I read that the use of the META - REFRESH method of periodic updates is considered bad web design, due to taking control from the user. So I have been looking at other possible scripting methods, I think the terms AJAX and XMLHttp appear to get mentioned at times, but that is all new to me, and have not found any obvious example code in the libraries for this yet, but will have to look into that. But was thinking about trying to write a simple test sketch that uses a scripted timer event to request specific data items, as that was closer to my original intentions when I started out with Python, but rapidly hit too many walls from lack of usable libraries, and CircuitPython not yet supporting the Pico-W. I am thinking that the way the graph image is obtained is close to what I am wanting to achieve in a scripting method, where a GET is posted requesting the data (image file), and would need to use the various ITEM tag/names to modify the content, which I guess is what the current test code is doing when the SVG file is received, which is why at times when things fail, I see a small ICON in place of the graph, because the SVG file was either lost or corrupted during the transfer.

I guess you do not have any test code examples of such "GET" post requests from a script perspective?

khoih-prog commented 1 year ago

Both Chrome and Vivaldi running simultaneously, 20+hrs

Selection_054

Sorry I have no more time to spend on this. You're on your own, unless you can prove this is a bug of the library.

Good Luck,

khoih-prog commented 1 year ago

Both Chrome and Vivaldi running simultaneously, 38+hrs . Final test

Selection_056

khoih-prog commented 1 year ago

Saw your post at Pi PICO-W WifiServer example code failure #860

Check this issue PICO W and tp-link wifi harware issues in RPi Forum to see anything useful to you.

Also test and see if the shared 2.4GHz WiFi is so crowded there in UK, creating channel competing situation, periodic (37 minutes) interference, etc.

Try also to use other routers (such as Cisco, Linksys, Netgear, etc., not the notorious TP-Link) and see if better. I bought a TP-Link to test quite some time ago, got very bad experience and never buy / use TP-Link again.

khoih-prog commented 1 year ago

HI @revell1

I don't know if this will work for you in UK, but try to modify as follows to test

  1. before

https://github.com/khoih-prog/AsyncWebServer_RP2040W/blob/6b4f24ee2b2e64a3049162232a19451085178301/examples/Async_AdvancedWebServer/Async_AdvancedWebServer.ino#L47

this line

#include <pico/cyw43_arch.h>
  1. before

https://github.com/khoih-prog/AsyncWebServer_RP2040W/blob/6b4f24ee2b2e64a3049162232a19451085178301/examples/Async_AdvancedWebServer/Async_AdvancedWebServer.ino#L185

this line

if (cyw43_arch_init_with_country(CYW43_COUNTRY_UK))
{
  Serial.println("Error setting country code!");
  // don't continue
  while (true);
}

For example

#if !( defined(ARDUINO_RASPBERRY_PI_PICO_W) )
  #error For RASPBERRY_PI_PICO_W only
#endif

#define _RP2040W_AWS_LOGLEVEL_     1

#include <pico/cyw43_arch.h>

#include <AsyncWebServer_RP2040W.h>

...

void setup()
{
  pinMode(LED_BUILTIN, OUTPUT);
  digitalWrite(LED_BUILTIN, LED_OFF);

  Serial.begin(115200);
  while (!Serial);

  delay(200);

  Serial.print("\nStart Async_AdvancedWebServer on "); Serial.print(BOARD_NAME);
  Serial.print(" with "); Serial.println(SHIELD_TYPE);
  Serial.println(ASYNCTCP_RP2040W_VERSION);
  Serial.println(ASYNC_WEBSERVER_RP2040W_VERSION);

  ///////////////////////////////////

  // check for the WiFi module:
  if (WiFi.status() == WL_NO_MODULE)
  {
    Serial.println("Communication with WiFi module failed!");
    // don't continue
    while (true);
  }

  Serial.print(F("Connecting to SSID: "));
  Serial.println(ssid);

  if (cyw43_arch_init_with_country(CYW43_COUNTRY_UK))
  {
    Serial.println("Error setting country code!");
    // don't continue
    while (true);
  }

  status = WiFi.begin(ssid, pass);

  delay(1000);
  ...

If working OK for you, I'll make some PR for the arduino-pico core, or at least for all RP2040W-related libraries to permit country configuration.


List of countries you can use is in cyw43_country.h of pico-sdk

#define CYW43_COUNTRY_WORLDWIDE         CYW43_COUNTRY('X', 'X', 0)

#define CYW43_COUNTRY_AUSTRALIA         CYW43_COUNTRY('A', 'U', 0)
#define CYW43_COUNTRY_AUSTRIA           CYW43_COUNTRY('A', 'T', 0)
#define CYW43_COUNTRY_BELGIUM           CYW43_COUNTRY('B', 'E', 0)
#define CYW43_COUNTRY_BRAZIL            CYW43_COUNTRY('B', 'R', 0)
#define CYW43_COUNTRY_CANADA            CYW43_COUNTRY('C', 'A', 0)
#define CYW43_COUNTRY_CHILE             CYW43_COUNTRY('C', 'L', 0)
#define CYW43_COUNTRY_CHINA             CYW43_COUNTRY('C', 'N', 0)
#define CYW43_COUNTRY_COLOMBIA          CYW43_COUNTRY('C', 'O', 0)
#define CYW43_COUNTRY_CZECH_REPUBLIC    CYW43_COUNTRY('C', 'Z', 0)
#define CYW43_COUNTRY_DENMARK           CYW43_COUNTRY('D', 'K', 0)
#define CYW43_COUNTRY_ESTONIA           CYW43_COUNTRY('E', 'E', 0)
#define CYW43_COUNTRY_FINLAND           CYW43_COUNTRY('F', 'I', 0)
#define CYW43_COUNTRY_FRANCE            CYW43_COUNTRY('F', 'R', 0)
#define CYW43_COUNTRY_GERMANY           CYW43_COUNTRY('D', 'E', 0)
#define CYW43_COUNTRY_GREECE            CYW43_COUNTRY('G', 'R', 0)
#define CYW43_COUNTRY_HONG_KONG         CYW43_COUNTRY('H', 'K', 0)
#define CYW43_COUNTRY_HUNGARY           CYW43_COUNTRY('H', 'U', 0)
#define CYW43_COUNTRY_ICELAND           CYW43_COUNTRY('I', 'S', 0)
#define CYW43_COUNTRY_INDIA             CYW43_COUNTRY('I', 'N', 0)
#define CYW43_COUNTRY_ISRAEL            CYW43_COUNTRY('I', 'L', 0)
#define CYW43_COUNTRY_ITALY             CYW43_COUNTRY('I', 'T', 0)
#define CYW43_COUNTRY_JAPAN             CYW43_COUNTRY('J', 'P', 0)
#define CYW43_COUNTRY_KENYA             CYW43_COUNTRY('K', 'E', 0)
#define CYW43_COUNTRY_LATVIA            CYW43_COUNTRY('L', 'V', 0)
#define CYW43_COUNTRY_LIECHTENSTEIN     CYW43_COUNTRY('L', 'I', 0)
#define CYW43_COUNTRY_LITHUANIA         CYW43_COUNTRY('L', 'T', 0)
#define CYW43_COUNTRY_LUXEMBOURG        CYW43_COUNTRY('L', 'U', 0)
#define CYW43_COUNTRY_MALAYSIA          CYW43_COUNTRY('M', 'Y', 0)
#define CYW43_COUNTRY_MALTA             CYW43_COUNTRY('M', 'T', 0)
#define CYW43_COUNTRY_MEXICO            CYW43_COUNTRY('M', 'X', 0)
#define CYW43_COUNTRY_NETHERLANDS       CYW43_COUNTRY('N', 'L', 0)
#define CYW43_COUNTRY_NEW_ZEALAND       CYW43_COUNTRY('N', 'Z', 0)
#define CYW43_COUNTRY_NIGERIA           CYW43_COUNTRY('N', 'G', 0)
#define CYW43_COUNTRY_NORWAY            CYW43_COUNTRY('N', 'O', 0)
#define CYW43_COUNTRY_PERU              CYW43_COUNTRY('P', 'E', 0)
#define CYW43_COUNTRY_PHILIPPINES       CYW43_COUNTRY('P', 'H', 0)
#define CYW43_COUNTRY_POLAND            CYW43_COUNTRY('P', 'L', 0)
#define CYW43_COUNTRY_PORTUGAL          CYW43_COUNTRY('P', 'T', 0)
#define CYW43_COUNTRY_SINGAPORE         CYW43_COUNTRY('S', 'G', 0)
#define CYW43_COUNTRY_SLOVAKIA          CYW43_COUNTRY('S', 'K', 0)
#define CYW43_COUNTRY_SLOVENIA          CYW43_COUNTRY('S', 'I', 0)
#define CYW43_COUNTRY_SOUTH_AFRICA      CYW43_COUNTRY('Z', 'A', 0)
#define CYW43_COUNTRY_SOUTH_KOREA       CYW43_COUNTRY('K', 'R', 0)
#define CYW43_COUNTRY_SPAIN             CYW43_COUNTRY('E', 'S', 0)
#define CYW43_COUNTRY_SWEDEN            CYW43_COUNTRY('S', 'E', 0)
#define CYW43_COUNTRY_SWITZERLAND       CYW43_COUNTRY('C', 'H', 0)
#define CYW43_COUNTRY_TAIWAN            CYW43_COUNTRY('T', 'W', 0)
#define CYW43_COUNTRY_THAILAND          CYW43_COUNTRY('T', 'H', 0)
#define CYW43_COUNTRY_TURKEY            CYW43_COUNTRY('T', 'R', 0)
#define CYW43_COUNTRY_UK                CYW43_COUNTRY('G', 'B', 0)
#define CYW43_COUNTRY_USA               CYW43_COUNTRY('U', 'S', 0)
khoih-prog commented 1 year ago

I also add new example Async_AdvancedWebServer_Country for you to try.

Just modify to the correct code here

https://github.com/khoih-prog/AsyncWebServer_RP2040W/blob/a647b541a4f0cd2494dc8949f16c49c5ea4467b1/examples/Async_AdvancedWebServer_Country/Async_AdvancedWebServer_Country.ino#L193

khoih-prog commented 1 year ago

Sorry don't use it yet. Still hanging, not working.

khoih-prog commented 1 year ago

The correct place to modify is picow_init.cpp of RP2040W

To be changed to

#include <pico/cyw43_arch.h>

extern "C" void initVariant() {
  //cyw43_arch_init();
  // Select country code, if necessary
  // For example: CYW43_COUNTRY_AUSTRALIA, CYW43_COUNTRY_CANADA, CYW43_COUNTRY_CHINA, CYW43_COUNTRY_FRANCE, CYW43_COUNTRY_GERMANY, 
  // CYW43_COUNTRY_INDIA, CYW43_COUNTRY_ITALY, CYW43_COUNTRY_JAPAN, CYW43_COUNTRY_NETHERLANDS, CYW43_COUNTRY_SOUTH_KOREA, 
  // CYW43_COUNTRY_SWEDEN, CYW43_COUNTRY_UK, CYW43_COUNTRY_USA, etc.
  cyw43_arch_init_with_country(CYW43_COUNTRY_UK);
}

from

#include <pico/cyw43_arch.h>

extern "C" void initVariant() {
    cyw43_arch_init();
}

You have to modify manually now, until we can find a way to auto-change in variants

khoih-prog commented 1 year ago

Test the new Async_AdvancedWebServer_Country example, which displays the country code. Just be sure to modify picow_init.cpp as above https://github.com/khoih-prog/AsyncWebServer_RP2040W/issues/3#issuecomment-1255676644


Start Async_AdvancedWebServer_Country on RASPBERRY_PI_PICO_W with RP2040W CYW43439 WiFi
AsyncTCP_RP2040W v1.0.0
AsyncWebServer_RP2040W v1.0.2
Connecting to SSID: HueNet1
SSID: HueNet1
Local IP Address: 192.168.2.180
Country code: GB          <================ Country code GB for CYW43_COUNTRY_UK
HTTP EthernetWebServer is @ IP : 192.168.2.180
.......... .......... .......... ...

Selection_063


Start Async_AdvancedWebServer_Country on RASPBERRY_PI_PICO_W with RP2040W CYW43439 WiFi
AsyncTCP_RP2040W v1.0.0
AsyncWebServer_RP2040W v1.0.2
Connecting to SSID: HueNet1
SSID: HueNet1
Local IP Address: 192.168.2.180
Country code: CA         <================ Country code CA for CYW43_COUNTRY_CANADA
HTTP EthernetWebServer is @ IP : 192.168.2.180
....

Selection_062

khoih-prog commented 1 year ago

Try the new AsyncWebServer_RP2040W v1.0.3 with country-code display in all examples.


Release v1.0.3

  1. Modify examples to display country-code
  2. Add tempo method to modify arduino-pico core to change country-code
  3. Add example Async_AdvancedWebServer_Country
revell1 commented 1 year ago

Hi, @khoih-prog (and @earlephilhower).

Thanks for a fast response. Here is a log of my testing with your new code. I am still trying to work out the results, as you will see from my notes. See 5], 6] and 7], can't prove it yet, but I think having a device not set to GB has ability to knock out other device regardless of their country code.

I do not know what could be occuring, but what if bad country code causes the network to switch to a different channel, will the board's Wifi hardware shift to the new channel or stay blindly on the original channel?

What happens if I change my MICROPYTHON code to select the wrong countrycode, will this break the code in same way I am seeing for CPP code? [Another test to add to my investigation].

I certainly think that the lack of country code selection in the 2.5.4 code is something that needs fixing, also I would suggest that if a @earlephilhower does a fix, it possibly should FORCE the user to specify a country code, so that a default is not used, that atleast makes the program author make a decision, and hopefully provide support of user configuration, for example what appers to be a common practice of using a secrets.h or secrets.py file to hold SSID and PASSWORD that could also have country code (and anything else that may be user critical), then you do not hide things in code files.

=-==== Steps so far:

1]: Updates rp2040 library set for INO from 2.5.2 to 2.5.4

2]: Built and loaded Async_AdvancedWebServer.ino onto one PI PICO-W board and set running. Open COM port for target board, observe IP address.

3]: Opened copy of Async_AdvancedWebServer_Country.ino as new project. Modified SSID and PASSWORD, but made no other changes, i.e. DID NOT set a country code. Built and loaded Async_AdvancedWebServer_Country.ino onto one PI PICO-W board and set running. Open COM port for target board, observe IP address.

4]: Let both run. Open Firefox and open TWO browser tabs. One to each target boards IP. Both finally failed "The server at 192.168.1.xxx is taking too long to respond." This was expected/desired result. Confirming issue still present in 2.5.4 library set.

5]: Modify Async_AdvancedWebServer_Country.ino and set : char countryCode[3] = { 0, 0, 0 }; to char countryCode[3] = { 'G', 'B', 0 };

[DID I ACTUALLY NEED TO DO THAT OR DOES THE NEXT CHANGE DO THE WORK?]

Modified picow_init.cpp =-=====

if 0

// ORIGINAL extern "C" void initVariant() { cyw43_arch_init(); }

else

// UK mod/test extern "C" void initVariant() { //cyw43_arch_init(); // Select country code, if necessary // For example: CYW43_COUNTRY_AUSTRALIA, CYW43_COUNTRY_CANADA, CYW43_COUNTRY_CHINA, CYW43_COUNTRY_FRANCE, CYW43_COUNTRY_GERMANY, // CYW43_COUNTRY_INDIA, CYW43_COUNTRY_ITALY, CYW43_COUNTRY_JAPAN, CYW43_COUNTRY_NETHERLANDS, CYW43_COUNTRY_SOUTH_KOREA, // CYW43_COUNTRY_SWEDEN, CYW43_COUNTRY_UK, CYW43_COUNTRY_USA, etc. cyw43_arch_init_with_country(CYW43_COUNTRY_UK); }

endif

=-===== Built and loaded new code onto second board. Closed both serial ports. Power cycle both boards. Open BOTH serial ports and check IP addresses. Refresh BOTH browser tabs. Wait.

Frustratingly, both targets eventually stop responding again. So adding Country code does not appear to have solved the issue. Are we sure that the changes to picow_init.cpp are actually making it to the hardware or is there something else needed? See 7] below, is presence of a device with wrong country code able to bring down the other device?

6]: OK so lets replace cpp code (original non Countrycode version) on first board with MICROPYTHON code. Restart both targets and com ports. Refresh browser pages. Lets see if both targets fail to respond after a period.

7]: Both boards appear to run without issues. So was country code version impacted by original code still running at same time on other board? Did the old code board crash the 2.4GHz network briefly taking both boards out?

8]: Watch this space. Further testing needed to see if device with wrong country code can briefly knock out other devices with correct country code.

As I write this, I had both MICROPYTHON and Async_AdvancedWebServer_Country running, (About 25 minutes) and Async_AdvancedWebServer_Country has just died, but MICROPYTHON is still running. Thats another thought shot down!!!!

Further testing needed, as still unclear what is actually happening, or why MICROPYTHON is not having an issue.

khoih-prog commented 1 year ago

[DID I ACTUALLY NEED TO DO THAT OR DOES THE NEXT CHANGE DO THE WORK?]

You don't need to do this. Did you check the country-code is correct ?

Country code: GB          <================ Country code GB for CYW43_COUNTRY_UK

and (9+ hrs and still running)

Selection_064

If correct, try to

  1. switch to Chrome or Vivandi
  2. try on Linux / Ubuntu machine
  3. Change the router
  4. Test using the simpler code, such as Async_HelloServer to have less burden on the network (shorter response, not too long svg)

It's possible MICROPYTHON has very simple and short Web response

revell1 commented 1 year ago

Hi, Yes the displayed Web page was showing GB.

The Micropython code I created generates a web page split into a number of gets, HTML, CSS, SCRIPT and a clone of the SVG image file. Probably more data in total, but split into multiple requests. For the AJAX request version, it is getting a small text block of probably 30 characters, while the SVG file is same size as your code, as a duplicated the creation method. And does not re-get the HTML,CSS or SCRIPT.

The Meta 5 second refresh version, it does a get of the root HTML document, that then gets all the other docs, as tried to prevent any CACHE of files.

The last death of the CPP code, left the serial com port heartbeat ticking away, so the code was still running. Also I could still PING the board and get a reply (15 - 100ms delay reported), but trying to browse to the IP address, browser gets timeout, Wireshark showed web request go out, but LED on PICO-W stayed OFF, so code did not appear to receive the GET request, otherwise should have got a blink. Can not tell if data actually got to the PICO, but a PING clearly does. Also the Micropython code was still happily responding.

Android table (again Firefox) could not get to the stalled board, but fine with other.

Will try installing some different browsers on Android. Also will look into trying Async_HelloServer, and also a seriously cut down response and see if that makes any difference.

Again thanks for you help, just wish the cause of the difference between the two boards would show itself so a fix could be worked out.

earlephilhower commented 1 year ago

FWIW, I am very doubtful the country setting has anything to do here.

All the country setting should do is a) adjust the allowed bands and possibly b) adjust the transmit power. I've never seen an AP which changes bands dynamically, so if you can see and connect to the AP then the country bands and setting aren't an issue.

khoih-prog commented 1 year ago

Good to know. I'm out now and nothing I can do here.

You're on your own now to isolate the issue, if possible and necessary.

revell1 commented 1 year ago

OK, thanks for your help.

All my evidence so far is still pointing to differences between either the CYW43 driver specific files in the Micropython and RP2040 github collections, or code running on top of the CYW43.

So far there appear to be many changes between the two file sets. ...\Arduino15\packages\rp2040\hardware\rp2040\2.5.4\pico-sdk\lib\cyw43-driver\src and ...\MICROPYTHON_SOURCE\micropython-master\drivers\cyw43

Lots of added documentation/comments, but also functional changes. But trying to find any explanations for various changes is proving criptic or lacking.

Yes I think that there is some environmental effect occuring with my router, but as Micropython happily runs without fault, either a bug was fixed some time in the past on the Micropython library forked code, or changes in the RP2040 branch have introduced an issue that causes the driver to die in such a way that pings still get responses but requests are not making their way into the application code, but may be getting into the CYW43439 chip and it's firmware but then blocking at the SPI interface. But that is still all a guess based on my observations to date. If the Micropython code died, then that would point more to the router, but as Micropython does not die, then it has to be Arduino application code or drivers, I can not see it is anything else.

Are their any code hooks that could be added to the USB serial port to either periodically report inner status of the CYW43 driver, or be used as some form of console/terminal debug menu?

The search for inspiration continues.

khoih-prog commented 1 year ago

For curiosity, I install Firefox, and have been running continuously all Chrome, Vivaldi and Firefox without issue in 19+ hours

The system is Ubuntu 20.04LTS ( using kernel 5.15.0-46-generic #49~20.04.1-Ubuntu SMP Thu Aug 4 19:15:44 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux)

Selection_066

Is it possible you're running on Windows machine, with some kind of virus, anti-virus, ransomware, spyware, etc. that can create havoc to your network ?

khoih-prog commented 1 year ago

Wow, just got one issue now, coming from the notorious Firefox, which destroys the operation of all 3 browsers. It seems that very bad behaviour of Firefox, sometimes surfaces to wreak havoc to the network / system.

Unrecoverable situation after this. I suggest you, if interested, create a bug report to Firefox, EGDE, etc., or avoid Firefox. EGDE, until they find out and destroy the bug.

It seems that the RP2040W is now crashed, no more heartbeat. Weird.

Selection_067

revell1 commented 1 year ago

Hi, well atleast I am not going mad.

I guess you did not try pinging the PICO after the failure, just to see if it was still on the network.

As I have found when web server is no longer responding, it generally can still be pinged, which I think would suggest that the Wifi chip and it's firmware is still doing something, but the link over SPI to the PICO processor code and drivers has stalled. Not sure, but I am guessing that ARP would still be working, but have not really looked at that with Wireshark.

Though you did say that the heartbeat had also stopped, which suggests the loop() code had also stalled or crashed out in some way, so not sure if ping would still respond, probably should unless stalled application code can also halt the Wifi chip code.

khoih-prog commented 1 year ago

We at least can isolate now that something bad in Firefox creates the issue.

I've never had the courage to use Firefox for quite a long time, especially when stable Chrome and Vivaldi have many better features. This experience also confirms my feelings somehow reasonable.

khoih-prog commented 1 year ago

No more ping after the Firefox havoc ;=}}

Seems you're in better shape there !!!

khoih-prog commented 1 year ago

OK, I think I found out what's wrong with Firefox, Edge, etc., and have tried to adjust both the

  1. AsyncTCP_RP2040W
  2. This AsyncWebServer_RP2040W

Running all 3 browsers (Chrome, Vivaldi and Firefox) w/o issue so far, continuously for 3+ hrs now.

It seems Firefox is much slower in response than other browsers, such as Chrome, Vivaldi, etc., and I have to tweak some time_outs to cope with it. Hopefully final here. Will leave them running for one more day to be sure.

Selection_070

khoih-prog commented 1 year ago

HI @revell1

Just released the AsyncWebServer_RP2040W v1.1.0. Be sure to use with the latest AsyncTCP_RP2040W v1.1.0+.

Your contribution is again noted in Contributions and Thanks

I've been testing using Firefox (in Ubuntu,not Windows 10) for many hours and still OK.

Async_AdvancedWebServer_Country_Firefox

Please test there to see if there still more issues.


Release v1.1.0

  1. Fix issue with slow browsers or network. Check Target stops responding after variable time when using Firefox on Windows 10 #3
khoih-prog commented 1 year ago

Selection_074

khoih-prog commented 1 year ago

HI @revell1

Please check the new AsyncWebServer_RP2040W v1.2.0 to see if better for you in Windows Firefox, by using the new and efficient Async_AdvancedWebServer_MemoryIssues_Send_CString example

khoih-prog commented 1 year ago

HI @revell1

You can try the new and very promising arduino-pico core branch nogoboomnow. So far so good for me, still testing now

Use the new version (in master) of Async_AdvancedWebServer to display heap data

Selection_161


More info can be found in Earle's PR Rewrite PicoW LWIP interface, major stability increase #916

Random crashes, infinite loops, and other lockups were happening to the PicoW while under high load from multiple clients.

This seems to have been due to two issues:

The periodic sys_check_timeouts() call from an alarm/IRQ was happening while the core was in LWIP code. LWIP is not re-entrant or multi-cire/thread safe so this is a bad thing. Some calls may not have been locked with a manual addition of the LWIPMutex object to hit this. The WiFi driver supplies packet data during an interrupt. PBUF work is legal in an interrupt, but actually calling netif->input() from an IRQ to queue up the Ethernet packet for processing is illegal if LWIP is already in progress. Rearchitect the LWIP interface to fix these problems:

Disable interrupts during malloc/etc. to avoid the possibility of the periodic LWIP timeout check interrupting and potentially calling user code which did a memory operation Wrap all used LWIP calls to note LWIP code will be executing, instead of manually eyeballing and adding in protection in user code. Remove all user code LWIPMutex blocking, the wrapping takes care of it. When an Ethernet packet is received by interrupt and we're in LWIP code, just throw the packet away for now. The upper layers can handle retransmit. (A possible optimization would be to set the packet aside and add a housekeeping portion to the LWIP wrappers to netif->input() them when safe). Ignore callbacks during TCP connection teardown when the ClientContext passed from LWIP == nullptr

khoih-prog commented 1 year ago

3 different browsers (Chrome. Vivaldi and Firefox) running simultaneously. Free Heap is very stable now

Selection_165

earlephilhower commented 1 year ago

FWIW, this is with the built-in webserver and a wget job that's been running at full tilt for the last day at several requests per second:

Pico-W Demo 

Hello from the Pico W!

Uptime: 23:04:17
Free Memory: 192376
Page Count: 1619444