ralph-irving / squeezeos-squeezeplay

Squeezeplay in Logitech Controller, Radio and Touch Squeezebox players community firmware
11 stars 6 forks source link

Wi-fi scan on Radio fails under the updated wpa_supplicant #1

Closed mw9 closed 3 years ago

mw9 commented 3 years ago

Unfortunately I hadn't noticed this in the last few months of testing.

There is a change in the way the new wpa_supplement (v2.9) and its wext driver (as used in the Radio) reports signal strength when scanning for networks. It is now reported in dBm, with a range from -192 to +63. In the old version (v0.6.9 on the Radio) we just had a number, 0 to 255.

The wi-fi browsing code, in net/Networking.lua, is unable to handle the negative numbers, and simply will not see any networks.

The change can be easily seen by comparing the result of wpa_cli scan-results on the old and new version:

Old:

# wpa_cli scan_results
Selected interface 'eth1'
bssid / frequency / signal level / flags / ssid
xx:xx:xx:xx:xx:xx   2412    198 [WPA2-PSK-CCMP][WPS]    XXXXX-XXXXX
xx:xx:xx:xx:xx:xx   2412    197 [WPA2-PSK-CCMP][WPS]    XXXXX-XXXXX
xx:xx:xx:xx:xx:xx   2437    194 [WPA2-PSK-CCMP] XXXXX-XXXXX
<snipped>

New:

# wpa_cli scan_results
Selected interface 'eth1'
bssid / frequency / signal level / flags / ssid
xx:xx:xx:xx:xx:xx   2437    -45 [WPA2-PSK-CCMP][ESS]    XXXXX-XXXXX
xx:xx:xx:xx:xx:xx   2412    -62 [WPA2-PSK-CCMP][ESS]    XXXXX-XXXXX
xx:xx:xx:xx:xx:xx   2412    -62 [WPA2-PSK-CCMP][WPS][ESS]   XXXXX-XXXXX
<snipped>

The Touch and Controller are not affected at this time, because they continue to use an older version of wpa_supplicant (v0.5.7) in combination with a closed marvell driver.

I can see three possible approaches to a fix. There may be others.

  1. Suitably modify net/Networking.lua, to account for the change
  2. Back out the relevant change in wpa_supplement
  3. Revert back to wpa_supplement v0.6.9 on the Radio

Re (1) - I have drafted up a change to net/Networking.lua that I think might fit the bill. I shall raise a PR which will give you the opportunity to examine it. It is quite simple, it certainly fixes the issue on my Radios, and should be a 'no-op' on the Controller and Touch. I can verify that on my Controller in due course, I don't have a Touch. A second set of eyes is always helpful.

Re (2) - Backing out the change to wpa_supplement should be simple, but I haven't actually tested/explored it.

Re (3) - Reverting back to wpa_supplement v0.6.9 should also be simple :smile:. But I recall your mentioning that the later version solves/eases some wi-fi issues that you had experienced, so perhaps this is not a preferred approach.

As a final remark, I notice that that the newer wpa_supplicant seems to cache the results of a wi-fi scan for a much shorter period of time. I haven't noticed that it adversely impacts scanning on the Radio, but it could conceivably call for an insertion of a small delay in the process, to allow time for a list of networks to be assembled. No action at this time, I think, and perhaps none will be required.

ralph-irving commented 3 years ago

Thank you for taking the time to investigate. As long as the original marvell wifi source is unavailable, I don't see the marvell wpa_supplicant version being upgraded since those binaries have special code to handle the marvell chipsets.

Re (3), Yes, I have two APs that the old wpa_supplicant v0.6.9 does not associate. The radio connects to both with v2.9. So it's the least favourable.

Re (2), I agree. This will likely be the fallback approach if the changes to Networking.lua impacts the controller and touch.

Re (1), I like the simple code change. Will have a play and test it on my touch and radio in the coming days.

mw9 commented 3 years ago

Re (1) - I have drafted up a change to net/Networking.lua that I think might fit the bill. I shall raise a PR which will give you the opportunity to examine it. It is quite simple, it certainly fixes the issue on my Radios, and should be a 'no-op' on the Controller and Touch. I can verify that on my Controller in due course, I don't have a Touch.

I have now verified on my Controller, it is indeed a 'no-op'.

As a final remark, I notice that that the newer wpa_supplicant seems to cache the results of a wi-fi scan for a much shorter period of time. I haven't noticed that it adversely impacts scanning on the Radio, but it could conceivably call for an insertion of a small delay in the process, to allow time for a list of networks to be assembled. No action at this time, I think, and perhaps none will be required.

Well, scanning on the Controller does suffer from this problem, but not because of any changes made in the firmware, I think. I so rarely do it.

I'm finding that I often need to give it "two go's" to get a list of networks. I shall check under the "official" firmware, though, just to be sure that it is not a new problem.

One other thing I have noticed on the Radio. The new wpa_supplicant formats non-ASCII characters in an SSID string as hex escapes, by and large. So I'm seeing the occasional name like \x00\x00\x00\x00\x00\x00\x00\x00\x00 showing up for some reason from my AP. This would have shown up as blank before. I suspect it is intended to be a 'hidden' AP, although why my AP is doing it defeats me !

I don't know that this will have any practical significance, other than looking odd. I shall take a few moments and check how SqueezePlay may handle this. But probably after the forthcoming (somewhat subdued) celebrations.

ralph-irving commented 3 years ago

I've confirmed that the Networking.lua change has no impact on the touch as well.

I'll wait to hear your findings on the scan results on the controller from the official firmware and hex SSIDS before moving ahead.

mw9 commented 3 years ago

Findings on scan results

The basic problem is this: the SetupNetworkingApplet.lua/Networking.lua combination does not wait long enough before retrieving scan results from wpa_supplement. In consequence, scan results can be very limited.

This was mitigated under SQP 7.7 by an automatic rescan that took place every five seconds while the initial scan listing was in view.

SQP 7.8 introduced some changes to the scanning menu, all quite useful, I suspect. But it removed that automatic rescan.

I have never noticed the effects of this on the Controller until now, because I have rarely had cause to set up networking. But the effects under both vanilla SQP 7.8 and the "community" 8.0 are the same, I am finding "two go's", or more, are needed.

On the Radio, I find that scan results under the new wpa_supplicant (v2.9) take longer than they do under v0.6.9. I have tested this by firing up wpa_cli in an interactive session, and issuing the scan command. Under either version a wpa event message will be issued when scan results are available. I find that, typically, this takes about 5/6 seconds under v0.6.9, but 10/11 seconds under v2.9. Again, this is requiring "two go's" or more to deal with.

The event message is not issued by v0.5.7 as used in the Touch and Controller.

I would be interested in knowing if you find the same as I do.

Based on my findings:

In the short term, I would think that reinstituting the automatic rescan is essential. It does prevent frustration. :)

For the medium/longer term, I think it is worth investigating how one might build in a sufficient delay before collecting scan results. Ideally using an identical approach on both platforms. It might, for example, be an easy matter to patch v0.5.7 to issue an event message as well, and then use that as a trigger.

I'll offer up an additional commit to the PR that restores the automatic rescan for you to look at, in due course. It's a very simple change.

Which said, the Radio, and probably the Touch, could do with a bit of delay in handling a "Finding networks" screen title change, because it flashes by so quickly. The Controller is fine - possibly the frame rate difference makes it less "flashy". I'll see if that can be easily mitigated before offering up my proposal.

Hex SSIDs

I have an idea or two about that, and will follow up when formulated.

ralph-irving commented 3 years ago

I get similiar results on the Radio between wpa_supplicant v2.9 10/11 sec. and v0.6.9 4/5 sec. However the list of wifi networks scanned is always 2-3 more with v2.9 and not always the same ones. Perhaps that because v2.9 performs the longer scan?

Unfortunately, we don't have the source for v0.5.7 as it's part of the private sources. The touch and controller use the same wpa_cli and wpa_supplicant binaries in the community firmware. Although wpa_cli and wpa_supplicant have the same version strings in jive_7.8.0_r16739 and fab4_7.8.0_r16754 logitech firmware binaries, they have different checksums. Which I'd expect as they were likely built from the private sources. From my investigations Marvell provide their own wpa_supplicant driver file which I can likely find if we want to try either adding the event message. The touch runs wpa_supplicant with the -Dmarvell option whereas the radio uses the -Dwext driver. We have the source for the marvell wifi chip on the touch perhaps leveraging that I can rebuild v0.5.7 for both. I'll look into that and whether building v2.9 with the marvell driver might be a better option.

mw9 commented 3 years ago

I've added a second suggested commit for the wi-fi autorefresh, as indicated above. It appears to works here on a Radio and Controller as intended, but I don't have a Touch.

I guess what's missing is the marvell "glue driver" source for hostap. I'd forgotten that when I previously posted. When you say you have "the source for the marvell wifi chip on the touch", what source is that ? The kernel driver ?

Given that all this is only required for a relatively rare wi-fi scan, perhaps it's not a priority. It might become relevant when "Hex SSIDs" begin to become more prevalent.

I don't know how "limited" your scan results have been. Without this patch, or "many go's", I frequently get no results at all on either the Radio (with 2.9) or the Controller. (Except, of course, for networks that have already been configured and are present in the wpa_supplicant.conf file). The autorefresh does "many go's" for me, and it has been necessary.

I do get varying results on the two Radios I have. I hadn't particularly noticed consistently more or less results with 2.9 vs 0.6.9, but I haven't really logged it. I might give it a go at some point, and see if there is a pattern.

As a matter of interest, do you know what it is about your access point that requires 2.9 on the Radio ?

ralph-irving commented 3 years ago

Yes, the kernel wifi driver on the touch is GPL so the sources were already in the squeezeos repository.

The wifi autorefresh change works okay on the touch. The text at the top changes between "Choose Network" and "Finding Networks..." about every 3 seconds.

No I never isolated the wpa_supplicant change(s) that fixed the issue with the AP. There are pages and pages of changes between v0.6.9 and v2.9. See https://w1.fi/cgit/hostap/plain/wpa_supplicant/ChangeLog The problem was that the dhcp client on the Radio would never obtain a lease/ip address with v0.6.9. I don't run a dhcp server on the AP I have a separate DLink NetDefender firewall that handles dhcp services.

I'd like to commit your pull request for both changes and build a new firmware to try ourselves before a general release. I have a couple squeezeos fixes to include as well.

mw9 commented 3 years ago

Hex SSIDs

I've added a further change to suppress the display of 'hidden' APs. Around me I see a number of names like \x00\x00\x00\x00\x00\x00\x00\x00\x00, and I have found references indicating that these are, indeed, observed by others and are intended as hidden APs.

The change simply identifies an SSID with a leading '\x00', and suppresses it from the wi-fi listings. This gives SqueezePlay the same behaviour that it would adopt under earlier versions of wpa_supplicant.

I have made no other attempt to sanitize "Hex SSIDs". We may never see any in practice, and perhaps best dealt with as and when they arise.

One could do more, but I am not inclined to do more at this time. Any thoughts welcome.

Scan results

I trialled adding in a five second delay between requesting a scan and obtaining the scan results. Although that improves the result of the initial scan on the Radio, it made no difference to the Controller.

The Controller seems to have a sweet spot. Requesting scan results within a period of, say, 3 to 5 seconds of requesting the scan gives a reasonable initial scan result. But request them earlier or later than that, and one only gets the currently connected SSID. So I don't know what is going on here, really.

I rolled the Controller firmware back to 7.7 (it's still on MySqueezebox.com), with identical results. But 7.7 carries out the autorefresh, and the problem is resolved that way.

I find this note in the wpa_supplicant changelog:

v1.0 wpa_supplicant * wext: Increase scan timeout from 5 to 10 seconds.

That confirms our finding on the Radio, and may underlie the poor initial scan result that I see on the Radio. I'll investigate further in due course, it would be good to get a better result. But autorefresh does resolve it for me.

New firmware At this stage I don't think I have anything further to add, and would be happy to assist in testing the new firmware.

ralph-irving commented 3 years ago

I've uploaded the new firmware files to https://sourceforge.net/projects/lmsclients/files/squeezeos/

If you have a local webserver, an easy way to present the updated firmware is by changing the COMMUNITY_FIRMWARE_REPOSITORY defined in CommunityFirmware/Plugin.pm to it, create a folder under the root of your webserver files matching your local LMS version ie. 8.1.1 and unziping the files into it and restart LMS.

My touch and radio have been updated to r16824 and so far testing has been positive.

Thank you for your continued help tracking down and proposing fixes to issues. I would like to add you to the list of people in the About applet that have contributed to the community firmware, would you be okay with that?

mw9 commented 3 years ago

OK, I've updated to r16824 on a Radio and a Controller, today. (I used the 'custom.baby/jive.bin method).

So far, positive. I've verified that the revised wi-fi setup behaves as expected, and there is a nice 'blue' icon if I turn off LMS. I have yet to check that the new default SSH configuration behaves more politely.

I'm still 'peeved' about the initial wi-fi scan results on the Radio, and will pursue that over time, as well as "special" characters in a SSID.

Regarding contributors in the About applet, that's a very gracious offer, which I would happily accept.

May I make a suggestion ? It is this: to interpose something along the lines of "Community Firmware release:" between the main list of contributors and yourself. A sort of sub-heading, that will distinguish the Community Firmware effort from the main story.

I'll revert if anything comes to light, but I don't suppose it will.

As a matter of interest - the 'procps Unknown Hz value' patch. I recall seeing that issue some years ago on my Sheeva plug (also an ARM 5), but didn't notice anything on the Radio, or Controller. Has it "bitten" you ? Or is it just waiting to bite ?

ralph-irving commented 3 years ago

That's a great idea to identify the Community Firmware contributors separately. I'll do that.

My ssh connection to the radio has been connected for 2 days so far. But I'm using an old ssh 6.0p1 client, hopefully you have similiar success.

I've seen the 'Unknown Hz value' message on the radio previously and during our last round of testing these changes the message started appearing again. Which was great as I was able to replace libproc-3.2.7.so with the patched build and confirm immediately that the message no longer displayed, put back the original lib and the message returned. So it should be fixed.

mw9 commented 3 years ago

My ssh connection to the radio has been connected for 2 days so far. But I'm using an old ssh 6.0p1 client, hopefully you have similiar success.

Well, my SSH connection has been up for 36 hours non-stop. Doing nothing, but it hasn't been thrown off. Logged in under a screen session.

So, I'd say the change looks to have been effective.

ralph-irving commented 3 years ago

Thanks for the confirmation. If no issues arise in the next few days, I'll release the r16824 firmware.

ralph-irving commented 3 years ago

I've released 8.0.1r16824 for LMS 8.1.1 and 8.2.0.

mw9 commented 3 years ago

Excellent !

SSH connections have lasted for circa three days (by accident !).

I shall close this issue now, but will revert in due course with any improvements to wi-fi scanning. I need a UTF-8 capable access point, or some-such, to play with. A Raspberry Pi running hostap will probably suffice.