rsta2 / circle

A C++ bare metal environment for Raspberry Pi with USB (32 and 64 bit)
https://circle-rpi.readthedocs.io
GNU General Public License v3.0
1.83k stars 243 forks source link

Circle WLAN w/Raspberry Pi AP / hostapd #469

Open davefilip opened 1 week ago

davefilip commented 1 week ago

Rene,

I am having some troubles with the WiFi support in addon/wlan, using Circle as a WiFi client. I can connect reliably to some Access Points (AP), but not others.

The complicating factor is that the immediate problem I am trying to solve is connecting to a Raspberry Pi running the full OS (e.g., Raspberry Pi OS 11 / Debian Bullseye) running hostapd, which connects exactly 50% of the time. By that I mean that every other connection works, and every other connection fails.

So why don't I talk to the hostapd folks? I have, but after trying a few things, they have been unable to help.

The frustrating thing is that using Circle WLAN, I am able to connect every time to a NETGEAR Orbi AP. What I can't figure out is what the incompatibility is between Circle WAN and hostapd.

When it does connect, it is solid and reliable (I've left it running overnight, and even see console log messages about renewing the IP (I'm using DHCP).

When it does NOT connect, I get a lot of (more than 300) "CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys" on the console log (see first image), and on the (RPi running Bullseye) hostapd I just see a lot of (more than 300) 'disassociated' errors with the MAC address of the RPi running Circle in the system log. So hostapd thinks that Circle WAN is disconnected and unable to connect.

So is this a Circle problem or a hostapd problem? Not sure, but I was not able to make progress in the hostapd forum, so I'm wondering if there is any known, or perhaps similar with yet a different AP, issue with Circle WAN connecting to certain APs?

On the Circle WAN side, I have tested and seen the exact same problem with Circle running on a RPi 3, 4 and Zero 2W. On the hostapd side, I have tested and seen the exact same problem on both RPi 3 and 4.

Unfortunately this is a bit of a problem for me because my use case was to have a number of Circle-based RPi Zero 2Ws monitoring various things in real time, and reporting back wirelessly to a RPi running Raspberry Pi OS (with an Apache web server, Python, Java, etc.) that I can connect to with my phone. The gist is that I want to have multiple Zero 2Ws running a Circle-based OS for real time data collection, all reporting back to one full Linux / Raspberry Pi OS.

Since it connects every other time, the reason that is a challenge is that all of these will all be headless RPis on a floating barge in the middle of a lake with no Internet connection. So the start-up sequence has to be carefully planned and coordinated, including any reboots.

So what does work is:

Desktop -> NETGEAR AP Circle WLAN -> NETGEAR AP Desktop -> hostapd

And what works every other time is:

Circle WLAN -> hostapd

I'm writing for any thoughts on figuring out what the incompatibility is, since it only fails every other time, to determine what Circle WAN is not providing to hostapd to keep it happy.

However, since this is wireless -- and I can't run a wireless network sniffer (can I?) -- I am not sure how to further debug this going forward? And I have already posted all of this on the hostapd forum, and nothing that was suggested has helped.

Any thoughts or idea?

Thanks,

Dave.

IMG_6252 copy IMG_6250 copy

rsta2 commented 1 week ago

Hi Dave, I successfully tested Circle WLAN with some APs (FRITZ!Box, D-Link and Hama), that I had here. I never tested with Raspberry Pi OS in AP mode. I will do this, but this will take some time. Unfortunately I have currently no idea, what happens there. I ported WPA-Supplicant to Circle some time ago, but did not use Hostap. I must get familiar with the field again.

Rene

davefilip commented 1 week ago

Thanks Rene! I’ve successfully used hostapd for a few years on a few diffeerent RPis, connecting from a desktop (macOS), iPhone, iPad (OK, all Apple!), and from other RPIs (also running Raspberry Pi OS). Circle was the first time I’ve had any problems, but reasonable that it was not in your test suite!

One additional note: the latest supported version of Raspberry Pi OS (Bookwork / Debian 12) replaces dhcpcd (used since the earliest days) with NetworkManager. This is significant only in that NetworkManager can be configured as a RPi AP without installing hostapd. However, at least from my admittedly limited testing, Circle WLAN does not seem to see or connect at all to a RPi running Bookwork as an AP (although I can still connect to it from my Apple devices).

So although this might muddy the waters, I mention it because if I could get Circle WLAN to connect reliably with Raspberry Pi Bookworm / Debian 12 / NetworkManager, that would give me a work-around. I have been focused mostly on hostapd simply because I have used it extensively in the past, and quite frankly because I am still running my RPi OS images on Buster (Debian 10) and Bullseye (Debian 11), but I could upgrade those images to Bookworm (Debian 11) if it provided as solution.

For instructions on how I have set up hostapd on a Raspberry Pi: https://howto.colornetlabs.com/howto_wap/

And for Circle, to confirm, I am building using Step 47. And, it should go without saying, I have not modified any of the add-on/wlan code myself.

Thanks in advance for any testing / debugging / advice you can offer regarding all of this.

On Sep 8, 2024, at 4:29 PM, Rene Stange @.***> wrote:

Hi Dave, I successfully tested Circle WLAN with some APs (FRITZ!Box, D-Link and Hama), that I had here. I never tested with Raspberry Pi OS in AP mode. I will do this, but this will take some time. Unfortunately I have currently no idea, what happens there. I ported WPA-Supplicant to Circle some time ago, but did not use Hostap. I must get familiar with the field again.

Rene

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/469#issuecomment-2336815074, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KWO3UWAJ2SBCYPB3N3ZVSXRZAVCNFSM6AAAAABN3DHL6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZWHAYTKMBXGQ. You are receiving this because you authored the thread.

rsta2 commented 1 week ago

I've already tested it with NetworkManager on Raspberry Pi OS 11. There were two problems: First Circle WLAN recognizes the AP as WPA, not WPA2. After changing the configuration for Circle WLAN to WPA, it tries to connect to the AP. An association is established, but in the 4-way WPA handshake the WLAN driver always reports an error in the fourth step. I don't know, how to continue here and will test with hostapd now. Thanks for the link to the instructions!

davefilip commented 1 week ago

Rene,

Thanks for the testing and feedback! Actually, my preference is for hostapd only because I’ve used it longer and had more experience with it … but most importantly I can install it on different computers, not just RPis … but the good news is that it does successfully connect 50% of the time, so the basic handshake protocol is working. Just something failing on reconnects.

The every-other-time fail I suspect is because Circle is not doing something (or doing something) that makes hostapd think it is a ‘bad’ connection, and then blacklists the MAC address.

One more hint: if I disconnect Circle WAN (e.g., unplug the connected Circle booted RPi) and then reset hostapd before a Circle WLAN new connection (systemctl restart hostapd), then the new Circle connection seems to always succeed.

And one final clarification (minor point, but just to be clear): If I rest hostapd (systemctl restart hostapd) and am running my Circle-based OS on multiple RPis, each one can connect successfully in succession, but when any one tries to re-connect without a restart of hostapd, it will fail.

[Although you probably got this before, just wanted to be clear that resetting hostapd is not necessary before ANY new Circle WLAN connecting, just one that had previously connected. The problem is specifically linked to the MAC address of a prior connection.]

Thanks again for any help you can provide in understanding why reconnections are failing. Although I have not seen this problem with hostapd and any other WIFI client, it may be possible that the bug is in hostapd, but if we can understand what that is, then I can go back to the hostapd forum to get more assistance from them (hopefully either a patch or a configuration work-around).

Cheers,

Dave.

On Sep 9, 2024, at 5:28 AM, Rene Stange @.***> wrote:

I've already tested it with NetworkManager on Raspberry Pi OS 11. There were two problems: First Circle WLAN recognizes the AP as WPA, not WPA2. After changing the configuration for Circle WLAN to WPA, it tries to connect to the AP. An association is established, but in the 4-way WPA handshake the WLAN driver always reports an error in the fourth step. I don't know, how to continue here and will test with hostapd now. Thanks for the link to the instructions!

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/469#issuecomment-2337603070, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KXNRHH7VAXVFNZYHF3ZVVS33AVCNFSM6AAAAABN3DHL6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZXGYYDGMBXGA. You are receiving this because you authored the thread.

rsta2 commented 1 week ago

Dave, I got hostapd running and I can reproduce the problem. It does not happen that often, but from time to time. You find a log file attached. I'm using Raspberry Pi OS 11 on a RPi 4B to run hostapd 2.9. The Circle test program is the hello_wlan in the most recent version from the develop branch. I enabled the debug messages from WPA-Supplicant here (set it to MSG_DEBUG) and the event reporting from the WLAN driver here (set to 1).

Rene

wlan.log

davefilip commented 1 week ago

Rene,

Thanks, glad you were able to reproduce it. Do you have any thoughts as to why in only happens sometimes and not others? Any difference in the handshake?

I am a bit surprised that it does not happen “that often”, since I can easily reproduce every other time.

So for testing, did you do disconnects / reconnects in software? Or did you unplug / re-plug the RPI running Circle, which is what I did? If the former, the might provide a hint. I always did the unplug / re-plug, since in my targeted environment, power might not be aways reliable.

I was testing with both:

======================================================== Raspberry Pi OS / Debian 10 / Buster

$ hostapd -v hostapd v2.8-devel User space daemon for IEEE 802.11 AP management, IEEE 802.1X/WPA/WPA2/EAP/RADIUS Authenticator Copyright (c) 2002-2019, Jouni Malinen @.***> and contributors

Raspberry Pi OS / Debian 11 / Bullseye

$ hostapd v2.9 User space daemon for IEEE 802.11 AP management, IEEE 802.1X/WPA/WPA2/EAP/RADIUS Authenticator Copyright (c) 2002-2019, Jouni Malinen @.***> and contributors

The latter matching your same configuration.

My eventual use case is to have several RPi’s running my Circle-based OS and the RPi running full Raspberry Pi OS running headless on the barge on the water, and connecting to the full Raspberry Pi OS from my phone to view results. Therefore, not a lot of diagnostics when it doesn’t connect. Although I guess I could just turn the activity LED on when it does connect and keep power-cycling until it turns on (which feels a bit messy?).

Any other thoughts?

Regards,

Dave.

On Sep 9, 2024, at 9:48 AM, Rene Stange @.***> wrote:

Dave, I got hostapd running and I can reproduce the problem. It does not happen that often, but from time to time. You find a log file attached. I'm using Raspberry Pi OS 11 on a RPi 4B to run hostapd 2.9. The Circle test program is the hello_wlan in the most recent version from the develop branch. I enabled the debug messages from WPA-Supplicant here https://github.com/rsta2/hostap/blob/3848d62a32d544c8601c105032f5e19f48d6f8bd/wpa_supplicant/main_circle.cpp#L44 (set it to MSG_DEBUG) and the event reporting from the WLAN driver here https://github.com/rsta2/circle/blob/6fe2064f22dbb5ed843f82502aabc3d5c2d8a38e/addon/wlan/ether4330.c#L29 (set to 1).

Rene

wlan.log https://github.com/user-attachments/files/16931364/wlan.log — Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/469#issuecomment-2338178839, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KSW2AXAEISQPRLEAQTZVWRLVAVCNFSM6AAAAABN3DHL6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZYGE3TQOBTHE. You are receiving this because you authored the thread.

rsta2 commented 1 week ago

Dave, I was removing the power from the client, running the Circle sample, and put it on to restart again. Unfortunately I have currently no idea, what is causing this problem. I will do more tests in the evening. I will be able to tell more exactly then, how often it happens.

Rene

davefilip commented 1 week ago

Rene,

Thanks for the update. I’ve just added the following to the XXXKernel::Initialize() for WiFi:

if (!m_WLAN.Initialize()) {
    s_pThis->m_Screen.Write(ERR_INIT_WIFI, strlen(ERR_INIT_WIFI));
    s_pThis->m_Screen.Write(ERR_KERNEL_PANIC, strlen(ERR_KERNEL_PANIC));
    return false;
} else if (debug) {
    s_pThis->m_Screen.Write(INF_INIT_WIFI, strlen(INF_INIT_WIFI));
}

if (!m_Net.Initialize(FALSE)) {
    s_pThis->m_Screen.Write(ERR_INIT_NETWORK, strlen(ERR_INIT_NETWORK));
    s_pThis->m_Screen.Write(ERR_KERNEL_PANIC, strlen(ERR_KERNEL_PANIC));
    return false;
} else if (debug) {
    s_pThis->m_Screen.Write(INF_INIT_NETWORK, strlen(INF_INIT_NETWORK));
}

if (!m_WPASupplicant.Initialize()) {
    s_pThis->m_Screen.Write(ERR_INIT_WPA, strlen(ERR_INIT_WPA));
    s_pThis->m_Screen.Write(ERR_KERNEL_PANIC, strlen(ERR_KERNEL_PANIC));
    return false;
} else if (debug) {
    s_pThis->m_Screen.Write(INF_INIT_WPA, strlen(INF_INIT_WPA));
}

And then after each connect attempt I pull the power, count to 5, and then plug the power back in (at the USB connector). So this will stall and not pass to XXXKernel::Run until it gets a WiFi connection (it will keep trying for a bit, but after it fails, it never gets connected).

Following this approach, my experience is alternate pass/fail, as in:

First Attempt - Connect Second Attempt- Not Connect Third Attempt - Connect Fourth Attempt - Not Connect Fifth Attempt - Connect … etc …

Let me know if you find out anything more, after more testing.

Cheers,

Dave.

On Sep 9, 2024, at 1:29 PM, Rene Stange @.***> wrote:

Dave, I was removing the power from the client, running the Circle sample, and put it on to restart again. Unfortunately I have currently no idea, what is causing this problem. I will do more tests in the evening. I will be able to tell more exactly then, how often it happens.

Rene

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/469#issuecomment-2338677096, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KRCX35WXFTBJ7C2I2TZVXLJPAVCNFSM6AAAAABN3DHL6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZYGY3TOMBZGY. You are receiving this because you authored the thread.

rsta2 commented 1 week ago

Dave,

here are my test results. I did 10 tries. The RPi 4B, running hostapd, was active the whole time. For each try I attached the power, waited for the result, removed the power and started again for the next try. The results:

1   OK
2   ERR Connect after 17s, IP after 32s
3   ERR Connect after 17s, IP after 32s
4   ERR Twice ERR, Connect after 28s, IP after 32s
5   OK
6   OK
7   ERR Connect after 17s, IP after 32s
8   ERR Connect after 17s, IP after 32s
9   ERR Multiple ERR, Connect after 48s, IP after 123s
10  OK

4 of 10 tries were immediately successful. All tries were successful after up to 2 minutes.

I did also some tries, where I restarted hostapd between each try. All tries were immediately successful here.

I guess, the problem is, that when the WLAN client is restarted within a relatively short time, the AP does not know, that it was restarted, because it thinks, it is still connected. That's why the re-connect fails and this triggers these "DISCONNECTED - remove keys" messages. That's not nice, but actually it should not be a problem.

Rene

davefilip commented 1 week ago

Rene,

OK, thanks for testing and sharing.

However, are you saying that when it did not connect, it would eventually connect if you let it keep trying?

If so, I did not see that, and it would just die with a no key offered error.

However, it sounds like you are saying that if I just wait long enough between disconnecting and reconnecting, it should be fine? So maybe “counting to 5” between restarts is just too short? And maybe I should just go and get a beer before restarting?

Regards,

Dave.

On Sep 9, 2024, at 3:29 PM, Rene Stange @.***> wrote:

Dave,

here are my test results. I did 10 tries. The RPi 4B, running hostapd, was active the whole time. For each try I attached the power, waited for the result, removed the power and started again for the next try. The results:

1 OK 2 ERR Connect after 17s, IP after 32s 3 ERR Connect after 17s, IP after 32s 4 ERR Twice ERR, Connect after 28s, IP after 32s 5 OK 6 OK 7 ERR Connect after 17s, IP after 32s 8 ERR Connect after 17s, IP after 32s 9 ERR Multiple ERR, Connect after 48s, IP after 123s 10 OK 4 of 10 tries were immediately successful. All tries were successful after up to 2 minutes.

I did also some tries, where I restarted hostapd between each try. All tries were immediately successful here.

I guess, the problem is, that when the WLAN client is restarted within a relatively short time, the AP does not know, that it was restarted, because it thinks, it is still connected. That's why the re-connect fails and this triggers these "DISCONNECTED - remove keys" messages. That's not nice, but actually it should not be a problem.

Rene

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/469#issuecomment-2338914332, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KSM7DJ4X6MCNVGB4CDZVXZJXAVCNFSM6AAAAABN3DHL6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZYHEYTIMZTGI. You are receiving this because you authored the thread.

rsta2 commented 1 week ago

Dave,

yes, this was my test result for the 10 tries. When I was let it keep trying, the connect was established after some seconds.

And yes, the connection timeout on the AP is longer than some seconds. Because the WLAN client is powered off, the AP does not notice, that the client is gone and when it tries to re-connect, it will get an error, because for the AP it is still connected.

Rene

rsta2 commented 1 week ago

I think, the client timeout in hostapd is about 90 seconds. So if you wait 2 minutes, it should be enough.

davefilip commented 1 week ago

Rene,

Thanks - yes, that seems to work, so sorry to bother you with all of this! I probably should have thought about delayed reboots.

So the gist is that because a Circle-based OS boots so quickly, it confuses hostapd, because it thinks that another device with the same MAC address is trying to connect. It all makes sense now.

Thanks for finding this work-round — which is to just chill and wait about 2 minutes before attempting to reboot. A bit messy, but I can make it work!

N.B.: From my Mac to hostpd I can connect and disconnect and re-connect, etc., in a matter of seconds, which I tried when I first discovered this with Circle, and I have probably done many times previously, and that does not confuse hostapd. But admittedly not exactly an apple-to-apples comparison, and it does take more than 90 seconds to re-boot a Mac, so I was performing an potentially invalid test.

Going forward, of course, the key will be getting Raspberry Pi OS 12 with NetworkManager, since that is the way Raspberry Pi — and in general Debian — is going.

One final quick question, only if you have a quick answer, if not, I won’t bother you further:

When using a Logitech keyboard w/Circle, I frequently get:

dwhci: Transaction failed (status 0x202)

messages on the console. However, it does not appear to ‘break’ anything, and I understand it is a USB timing issue? I have tried setting usbspeed=full in cmdline.txt, which just caused my keyboard not to work. And advice on how to get rid of these annoying messages? I’ve seen references in some of your other posts online, particularly related to MIDI, but I haven’t found anything that works. But again, it doesn’t appear to actually prevent anything from working.

Thanks again for all your time and help for finding the (simple if perhaps not just obvious) work around to my WiFi connection problems with hostapd.

Cheers,

Dave.

On Sep 10, 2024, at 6:40 AM, Rene Stange @.***> wrote:

I think, the client timeout in hostapd is about 90 seconds. So if you wait 2 minutes, it should be enough.

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/469#issuecomment-2340303981, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KXIURIN6DLIX4XOONTZV3EB3AVCNFSM6AAAAABN3DHL6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBQGMYDGOJYGE. You are receiving this because you authored the thread.

rsta2 commented 1 week ago

Dave,

the problem is, that when the Circle app is simply powered off, it does not disconnect from the AP. Circle can disconnect from the AP with this call, but this happens only, when the instance of CBcm4343Device is destroyed, which is normally not done in the hello_wlan sample.

This "Transaction failed" message is caused by an inaccurate USB timing and can be ignored. If you want to have a better USB timing you can enable the USE_USB_FIQ system option, but you cannot use the FIQ for other purpose then.

You are welcome.

Cheers,

Rene

davefilip commented 1 week ago

Rene,

the problem is, that when the Circle app is simply powered off, it does not disconnect from the AP. Circle can disconnect from the AP with this call https://github.com/rsta2/circle/blob/749fe99301846217439a40ac7bda547a986db0f7/addon/wlan/bcm4343.cpp#L45, but this happens only, when the instance of CBcm4343Device is destroyed, which is normally not done in the hello_wlan sample.

Understood, but because the problem I initially described also happens with a reboot or shutdown, for which I am explicitly calling the reboot() and halt() functions (as defined in startup.h), I presume those functions are two low level to call all of the delete (::~) functions for each instantiated object. Actually, if I remember correctly, I think I found those in the assembly code?

I know that I described simply pulling the power plug when I first reached out, as I provided that as the simplest example to replicate the problem, but I initially found the problem after calling reboot() [as I normally don’t go around indiscriminately unplugging USB power cables].

So to prevent the problem on a reboot() — otherwise I must always so a shutdown, wait 2 mins, and then do a power cycle — do you recommend I find the address of the CMcm4343Device device and explicitly delete it before calling reboot()? If so, these are the devices I am seeing:

00 00606EC0 ready main 01 00618680 ready wifireader 02 0062C700 ready wifitimer 03 00640CC0 ready net 04 00651140 sleep wpa_supplicant 05 006AF040 block @6af040 06 00678400 sleep dhcp 07 006B0140 block @6b0140 08 006C0A00 block httpd 09 006D4A80 sleep ntpd 10 0081C100 run @81c100

So is ’net’ an instantiation of CMcm4343Device? Or is that abstracted, and is there another way that I need to find the addressor the CMcm4343Device device?

Apologies if this should be obvious and in the documentation, which I have spent a lot of time reading, and I have previously looked for but not found a way to cleanly shut down the network. Feel free to reply with a ‘RTFM’ and a link to where I should start reading.

I'm willing to believe that MacOS and other devices more clearly disconnect from the network, which is perhaps why I don’t see the problem there. So if I can do a clean disconnect before I call reboot() or halt(), should that prevent this problem directly after a reboot/shutdown?

[And it is not lost on me that a “bare metal” OS means that I have take care of all the little details that are hidden in a full blown OS like macOS, Windows, or Linux.]

This "Transaction failed" message is caused by an inaccurate USB timing and can be ignored. If you want to have a better USB timing you can enable the USE_USB_FIQ system option, but you cannot use the FIQ for other purpose then.

Understood, but one of the main reasons I am using the Circle environment is because using FIQ allows me to detect the change in a GPIO pin very quickly. For example, with full Raspbian, using a signal generator to send a PWM signal to a GPIO pin, I could only detect frequencies bellow 1 KHz, and with Circle and FIQ, I have successfully tested up to 20 KHz (and assume I can go a lot higher than that if I ever need to). So I am not going to give up on using FIQ simply to eliminated this annoying message.

Nonetheless, once I’m done writing and debugging, I can also just set the log level to 1, and the messages disappear (along with lots of other log messages I’m generating while writing and debugging a bare metal OS). So I will simply live the with annoyance until I am done writing new code, and then change the log level.

Thanks again,

Dave.

On Sep 10, 2024, at 10:24 AM, Rene Stange @.***> wrote:

Dave,

the problem is, that when the Circle app is simply powered off, it does not disconnect from the AP. Circle can disconnect from the AP with this call https://github.com/rsta2/circle/blob/749fe99301846217439a40ac7bda547a986db0f7/addon/wlan/bcm4343.cpp#L45, but this happens only, when the instance of CBcm4343Device is destroyed, which is normally not done in the hello_wlan sample.

This "Transaction failed" message is caused by an inaccurate USB timing and can be ignored. If you want to have a better USB timing you can enable the USE_USB_FIQ system option, but you cannot use the FIQ for other purpose then.

You are welcome.

Cheers,

Rene

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/469#issuecomment-2340985870, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KQJQNCZ4LEH5WBXI73ZV36KZAVCNFSM6AAAAABN3DHL6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBQHE4DKOBXGA. You are receiving this because you authored the thread.

davefilip commented 1 week ago

Apologies: halt(), reboot() and poweroff() all in sysinit.cpp, although still peppered with some assembly instructions (they are not in pure assembly code). I misspoke earlier.

Also, I listed tasks and not devices, I copied the wrong bit of the output (I have a test page that displays both devices and processes); these are the only registered devices I am seeing:

Character Device : string Block Device : emmc1 Block Device : emmc1-2 Block Device : emmc1-1 Character Device : tty1

So please disregard my last message, as I was having a bit of a brain freeze, but I am feeling much better now … ;-)

But I am interested in a way of cleaning shutting down the WiFi connection, if possible, so that reboot() is clean, which I understand requires getting the address of the CBcm4343Device device and deleting it?

And apologies again for my last message, which contained bits of gibberish …

On Sep 10, 2024, at 3:10 PM, David Filip @.***> wrote:

Rene,

the problem is, that when the Circle app is simply powered off, it does not disconnect from the AP. Circle can disconnect from the AP with this call https://github.com/rsta2/circle/blob/749fe99301846217439a40ac7bda547a986db0f7/addon/wlan/bcm4343.cpp#L45, but this happens only, when the instance of CBcm4343Device is destroyed, which is normally not done in the hello_wlan sample.

Understood, but because the problem I initially described also happens with a reboot or shutdown, for which I am explicitly calling the reboot() and halt() functions (as defined in startup.h), I presume those functions are two low level to call all of the delete (::~) functions for each instantiated object. Actually, if I remember correctly, I think I found those in the assembly code?

I know that I described simply pulling the power plug when I first reached out, as I provided that as the simplest example to replicate the problem, but I initially found the problem after calling reboot() [as I normally don’t go around indiscriminately unplugging USB power cables].

So to prevent the problem on a reboot() — otherwise I must always so a shutdown, wait 2 mins, and then do a power cycle — do you recommend I find the address of the CMcm4343Device device and explicitly delete it before calling reboot()? If so, these are the devices I am seeing:

00 00606EC0 ready main 01 00618680 ready wifireader 02 0062C700 ready wifitimer 03 00640CC0 ready net 04 00651140 sleep wpa_supplicant 05 006AF040 block @6af040 06 00678400 sleep dhcp 07 006B0140 block @6b0140 08 006C0A00 block httpd 09 006D4A80 sleep ntpd 10 0081C100 run @81c100

So is ’net’ an instantiation of CMcm4343Device? Or is that abstracted, and is there another way that I need to find the addressor the CMcm4343Device device?

Apologies if this should be obvious and in the documentation, which I have spent a lot of time reading, and I have previously looked for but not found a way to cleanly shut down the network. Feel free to reply with a ‘RTFM’ and a link to where I should start reading.

I'm willing to believe that MacOS and other devices more clearly disconnect from the network, which is perhaps why I don’t see the problem there. So if I can do a clean disconnect before I call reboot() or halt(), should that prevent this problem directly after a reboot/shutdown?

[And it is not lost on me that a “bare metal” OS means that I have take care of all the little details that are hidden in a full blown OS like macOS, Windows, or Linux.]

This "Transaction failed" message is caused by an inaccurate USB timing and can be ignored. If you want to have a better USB timing you can enable the USE_USB_FIQ system option, but you cannot use the FIQ for other purpose then.

Understood, but one of the main reasons I am using the Circle environment is because using FIQ allows me to detect the change in a GPIO pin very quickly. For example, with full Raspbian, using a signal generator to send a PWM signal to a GPIO pin, I could only detect frequencies bellow 1 KHz, and with Circle and FIQ, I have successfully tested up to 20 KHz (and assume I can go a lot higher than that if I ever need to). So I am not going to give up on using FIQ simply to eliminated this annoying message.

Nonetheless, once I’m done writing and debugging, I can also just set the log level to 1, and the messages disappear (along with lots of other log messages I’m generating while writing and debugging a bare metal OS). So I will simply live the with annoyance until I am done writing new code, and then change the log level.

Thanks again,

Dave.

On Sep 10, 2024, at 10:24 AM, Rene Stange @.***> wrote:

Dave,

the problem is, that when the Circle app is simply powered off, it does not disconnect from the AP. Circle can disconnect from the AP with this call https://github.com/rsta2/circle/blob/749fe99301846217439a40ac7bda547a986db0f7/addon/wlan/bcm4343.cpp#L45, but this happens only, when the instance of CBcm4343Device is destroyed, which is normally not done in the hello_wlan sample.

This "Transaction failed" message is caused by an inaccurate USB timing and can be ignored. If you want to have a better USB timing you can enable the USE_USB_FIQ system option, but you cannot use the FIQ for other purpose then.

You are welcome.

Cheers,

Rene

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/469#issuecomment-2340985870, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KQJQNCZ4LEH5WBXI73ZV36KZAVCNFSM6AAAAABN3DHL6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBQHE4DKOBXGA. You are receiving this because you authored the thread.

rsta2 commented 1 week ago

Dave,

if you use this version of main.cpp, which does not contain explicit calls to halt() and reboot(), you should be able to return from CKernel::Run() with ShutdownReboot or ShutdownHalt and main() will in turn return with EXIT_REBOOT or EXIT_HALT and because CKernel is instantiated locally in main(), the destructor of CBcm4343Device will be called, because it's a member of CKernel. This should shutdown the WLAN connection.

I know, that in most versions of main.cpp there is a comment, that one cannot return from main(), because some destructors are not implemented. In the meantime a number of destructors have been implemented and there is a good chance, that it will work now. It depends on the classes, which are members of CKernel in your application. Let me know, if it does not work (e.g. because you get failed assertions). We will find an other solution then. I tested it with the hello_wlan sample and it works there.

Rene

davefilip commented 1 week ago

Thanks, got it!

I actually moved all of my business logic out of the kernel and call reboot() and halt() from a user command shell I’ve written … to keep the kernel relatively small and basic, a slight deviation from the Circle model / samples … but I now understand what needs to be done.

Thanks again for all your help, and I think I understand everything that I need to right now.

Cheers for writing and making available such a great stable, functional and well documented environment for writing a “bare metal” OS! I tried a few earlier attempts that failed mostly because I spent way too much time dealing with differences in RPi hardware between models, and you’ve abstracted away all of those differences. What you’e really provided is what most other OSs would call the HAL, or Hardware Abstraction Layer. Along with some cool functions for managing memory and tasks. Thank you!

On Sep 10, 2024, at 6:43 PM, Rene Stange @.***> wrote:

Dave,

if you use this version of main.cpp https://github.com/rsta2/circle/blob/master/sample/38-bootloader/main.cpp, which does not contain explicit calls to halt() and reboot(), you should be able to return from CKernel::Run() with ShutdownReboot or ShutdownHalt and main() will in turn return with EXIT_REBOOT or EXIT_HALT and because CKernel is instantiated locally in main(), the destructor of CBcm4343Device will be called, because it's a member of CKernel. This should shutdown the WLAN connection.

I know, that in most versions of main.cpp there is a comment, that one cannot return from main(), because some destructors are not implemented. In the meantime a number of destructors have been implemented and there is a good chance, that it will work now. It depends on the classes, which are members of CKernel in your application. Let me know, if it does not work (e.g. because you get failed assertions). We will find an other solution then. I tested it with the hello_wlan sample and it works there.

Rene

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/469#issuecomment-2342290632, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KSGS5PFNJS7PJWYH7TZV5YX3AVCNFSM6AAAAABN3DHL6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBSGI4TANRTGI. You are receiving this because you authored the thread.

rsta2 commented 1 week ago

Dave, you are welcome. And thanks for appreciating Circle!

rsta2 commented 1 day ago

Only serious USB "Transaction failed" errors are logged on the develop branch now. So these messages should not occur any more in your setup.

davefilip commented 9 hours ago

Rene,

Thanks for making that change! Since USB timing can be a bit flakey, make sense that you only want serious errors (a.k.a., errors that could result in the loss of data) logged.

I’ve changed the subject because I might have something somewhere between an enhancement request and a Circle programming question … ;-)

As far as I can tell, and I have been looking through the code … albeit with find/grep to hopefully get to the right place, so I may be missing something … Circle (as of 47) does not appear to have an ICMP / PING “client”, e.g., to determine (which a quick time-out) whether or not another node is accessible on the network.

Of course, you do provide an ICMP / PING “server” that responds to ping requests from another computer. But I can’t find anything to SEND an ICMP / PING request?

Can you confirm if that is the case, or am I just not looking in the right place (given how much functional is included in Circle, it seems like an obvious omission, which is not a criticism in any way, but just unexpected giving how much is already included).

If not, before I spend a lot of time trying to build it, can you confirm whether … as far as you know … NetworkLayer::Send() should be able to send an ICMP / PING packet? If so, I admit that I need to spend some more time researching the ICMP packet format, but just wondering if you think it should be possible.

Of course I’ll need to get responses coming back, but don’t want to go too far down the rabbit hole unless I know that the Circle networking layer should be able to handle sending non-socket packets like ICMP.

And apologies if this is included somewhere, and I just haven’t found it (either in the doc or the source code).

Cheers,

Dave.

On Sep 19, 2024, at 4:24 PM, Rene Stange @.***> wrote:

Only serious USB "Transaction failed" errors are logged on the develop branch now. So these messages should not occur any more in your setup.

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/469#issuecomment-2362109082, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KT6DEJJLPB4UOT6E3TZXMXH5AVCNFSM6AAAAABN3DHL6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRSGEYDSMBYGI. You are receiving this because you authored the thread.

rsta2 commented 5 hours ago

Dave, you are right, there is no ping client in Circle yet, and yes, you can use CNetworkLayer::Send() to send ICMP packets. But as you wrote, you need a method to pass through the replies. ICMP is specified in RFC 792.

Rene

davefilip commented 2 hours ago

Rene,

Thanks for confirming both points.

Although I should have no problem sending ICMP / PING packets, I suspect — although have not yet gone deep into the code — that I will need to patch some code within the Circle network layer to handle the responses, which is what I have been trying to avoid since the start (due to problems upgrading later on). Which is why I would suggest / encourage — assuming that this would be of value to others, which I think it would? — adding a ping client to core Circle at some point in the future.

My use case for wanting ping is to have a “cluster” of RPis running my custom OS “discover” each other (by broadcasting MQTT messages to a shared topic), so that I can then run commands and/or draw graphics on any node in the cluster from any other node. The challenge is that when a node goes off-line, socket based things (like HTTP / HTTPS, even just a simple ’telnet’-like socket) tend to hang. Because ICMP is so light weight, it can tell me quickly if another node is still active.

I’ve been using MQTT long before Circle, and yes, I could write a simple MQTT ‘ping’ (call/response) as well , but even MQTT will hang uncomfortably long when nodes come on and go offline.

I honestly am very thankful for all you’ve included in Circle thus far, and I have no idea how much time you can spend on it going forward, but I hope you don’t mind me sending suggestions from time to time, along with potential use cases, to give you ideas on what you might think about adding in the future.

Cheers,

Dave.

On Sep 20, 2024, at 5:32 PM, Rene Stange @.***> wrote:

Dave, you are right, there is no ping client in Circle yet, and yes, you can use CNetworkLayer::Send() to send ICMP packets. But as you wrote, you need a method to pass through the replies. ICMP is specified in RFC 792 https://www.rfc-editor.org/info/rfc792.

Rene

— Reply to this email directly, view it on GitHub https://github.com/rsta2/circle/issues/469#issuecomment-2364644947, or unsubscribe https://github.com/notifications/unsubscribe-auth/BA2V5KXWLKWMS56K3Q7J7KDZXSH6DAVCNFSM6AAAAABN3DHL6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRUGY2DIOJUG4. You are receiving this because you authored the thread.