volumio / Build

Buildscripts for Volumio System
GNU General Public License v2.0
113 stars 102 forks source link

Improve resilience of wireless service #143

Closed earlchew closed 7 years ago

earlchew commented 7 years ago

As a user, I want to have access to the volumio wifi hotspot when I do have use of any wifi networks, so that I will always have some means to connect to the volumio service.

Related Issues:

Use Cases

earlchew commented 7 years ago

An initial version implementing the above can be reviewed here: https://github.com/volumio/Build/compare/master...earlchew:issue-143

macmpi commented 7 years ago

Thanks @earlchew this is really helpful. Did some research also few months ago on this troublesome matter, and came across a very well regarded (and supported) solution to perform hostpot function mentioned in archlinux wiki: create_ap. This script (currently in bash, rewrite in progress in ruby), does many of the necessary HW capability checks and fail-safe measures. It also operates without modifying system settings (does it's setup in temporary directories), and manages special cases like Edimax.

I think it can probably bring a very mature, tested, and supported way to implement hotspot function within Volumio, without too much custom rework, and bug crawling through many tricky issues. Interested in your views as you are looking deeply into that.

volumio commented 7 years ago

Interesting and well done, really. I would like to give it a go. One thing I see, is that we need to have a settings for dnsmasq to not forward to the UI, sometimes is needed but sometimes not. Do you think we can handle that?

earlchew commented 7 years ago

A couple of other notes and observations:

earlchew commented 7 years ago

@volumio wrote:

we need to have a settings for dnsmasq to not forward to the UI, sometimes is needed but sometimes not.

I'm not sure I understand the issue you are referring to. I think you are describing the use of dnsmasq in the context of the hotspot. Would you provide a more detailed description?

earlchew commented 7 years ago

@macmpi wrote:

I think it can probably bring a very mature, tested, and supported way to implement hotspot function within Volumio

I suppose it could, but I think it's worth also asking what is expected from the wifi hotspot. I see that create_ap brings a lot of functionality, but to do so it makes the implementation more complicated. The current hotspot implementation depends on hostapd and dnsmasq, and client_ap requires these, additional dependencies, includes a reasonable size daemon, and perhaps will also add ruby.

My understanding is that the intent of the hotspot is to function as a backup means to connect to the volumio application in the absence of any other viable network (Ethernet, or wifi network). If that remains the case, maybe it is worth keeping the configuration of the hotspot as straightforward as possible. Providing a lot of functionality and complexity here might make it less reliable or harder to use as a means of connection of last resort.

Should there be a desire to use client_ap (or something similar) to provide additional functionality, I notice that there is systemd support. This can work quite well with the reworked implementation of wireless.js because wireless.js focuses on starting and stopping wireless-hotspot.service, and the bulk of the integration with client_ap can probably be focused there:

# wireless-hotspot.service
[Unit]
PartOf=wireless.service
Requires=client_ap.service
Before=client_ap.service
...
macmpi commented 7 years ago

@earlchew wrote:

This can work quite well [...]

Yes, I was thinking along this line indeed: activation logic is most likely Volumio specific, but actual operation may be performed by such specialized "unit" under systemd. The benefit I saw was all the capability checks and experience with some drivers limitations (i.e Raspi3) which may help debug nasty issues: we often need to understand why it does not work, given the number of possible dongles and drivers out-there...create_ap can provide good hints at what's going wrong. I did not really experience dependency or footprint issues with it, and it seemed flexible enough to run it in predetermined (and limited albeit flexible) options (--no-virt in particular, probably no need for full NAT to avoid routing issues, bridged + Zeroconf probably enough, etc): just choose a reasonable options mix and issue a 1 single command within systemd service.

Anyway, your call obviously: glad this overall feature can be revisited and improved.

PS: It's important to keep in mind some devices (like piZero) have no default built-in network interfaces to avoid explicit dependencies & issues linked to eth0 existence, etc... (exemple here)

earlchew commented 7 years ago

@macmpi I appreciate the feedback. You wrote:

It's important to keep in mind some devices (like piZero) have no default built-in network interfaces to avoid explicit dependencies & issues linked to eth0 existence, etc

For the present, I don't believe I have made that situation any worse, but would appreciate any comments you might have regarding the reworked implementation in this regard.

The netplug modifications include supporting the probe call for eth0, and presumably if that fails, netplug will abandon the interface.

The wireless modifications attempt to manage wlan0. The wireless.js implementation should loop endlessly on its normal operational cycle on a hopeless quest if indeed wlan0 is not present. I did not verify this yet, but I'll take the time to do so in my next round of testing.

If as you say create_ap does provide substantial benefit when it comes to support of disparate hardware, that might be reason enough to adopt it. For the present, I would like to avoid making this change set any larger, and see these changes reviewed, tested, and adopted to improve the wireless scenarios described above.

If it makes sense, the situation regarding create_ap revisited after that I think. Tracking this in https://github.com/volumio/Build/issues/147

macmpi commented 7 years ago

Appreciate your systematic approach. Looking forward testing it.

earlchew commented 7 years ago

@macmpi I saw your reference to https://github.com/volumio/Volumio2/issues/791. In that issue the hotspot was started even though it was set to OFF. I believe these changes would address that first part:

Setting Hotspot Client
false disabled enabled
'off' disabled enabled
'on' enabled disabled
'auto' enabled enabled
true enabled enabled

The other question pertains to DNS. The only time a DNS server is run is when the hotspot is started. The only reason I can think of to run the DNS server is to make is easier for clients to connect to the Volumio device by name, rather than by IP address.

Apart from that, I can't think of any other reason to run the DNS server. Is there any expectation that the DNS server actually provide a fully-featured hotspot service to allow clients to connect to the internet via the volumio hotspot?

macmpi commented 7 years ago

@earlchew Thanks for your note. Actually my reference was more general through a discussion on another commit, in which I also linked that older issue as exemple. That older issue investigation ended-up as a mixed bag of unwanted hotspot startup as you enlighten (and will probably address), and a DNS name resolution issue that popped in specific ISP conditions (ISP blocking Google DNS that used to be forced as default in Volumio).

Anyway, back to your question, I concur that one would not expect to run DNS server on Volumio in general (or at all). To me (but other may think differently?) the hotspot is merely an accesspoint servicing primarily as a basic bridge: it should not hamper with any DHCP or DNS server, or router eventually existing on the local network. Just in case there is no DHCP & DNS service available (i_e. router-less point-to-point network), it may just provide Zeroconf-type addressing & discovery to ease initial setup from most common platforms.

Typical case illustrating this:

PS: On your proposed settings table, how about merging false and "OFF" on one hand, and then true and "Auto" on the other hand. We really only need 3 settings not 5 I guess.

macmpi commented 7 years ago

FYI, a seemingly RPi3 wifi specific issue about how Hostspot management (combined with likely driver issue) may adversely affect how RPi3 can connect AP in client mode. Possibly yet-another-special-case to handle.

earlchew commented 7 years ago

@macmpi wrote:

On your proposed settings table, how about merging false and "OFF" on one hand, and then true and "Auto" on the other hand. We really only need 3 settings not 5 I guess.

The only reason I have five possibilities, rather than three, is for the code to support existing configurations (True vs False) as well as new configurations (on, off, auto). If after merging there is no need to support legacy configurations, then as you point out we can go ahead and drop that part of the code.

volumio commented 7 years ago

I would say on\off\auto is the ideal configuration

earlchew commented 7 years ago

I wrote:

The wireless modifications attempt to manage wlan0. The wireless.js implementation should loop endlessly on its normal operational cycle on a hopeless quest if indeed wlan0 is not present. I did not verify this yet, but I'll take the time to do so in my next round of testing.

Today I had the chance to remove my Wifi dongle, and boot with Ethernet available. I confirmed that the revised wireless service loops at intervals trying to enable wlan0 for the hotspot, and then for the client. Neither of course succeeds because wlan is not available. I used top(1) to confirm that the cpu is idle (ie none of the wireless services are causing the cpu to loop hard).

macmpi commented 7 years ago

Hi nice to hear we are probably getting closer to experiment your PR.

There's a use-case you may be interested in testing/reviewing, particularly if your have a Pi3 or PiZeroW (please check this Forum thread).

In current implementation, its seems handover between Hotspot mode and client mode causes issue with those devices. Such handovers situations happen often in current implementation as Hotspot is mostly in AUTO mode (hotspot start at boot, automatic hotspot connextion if home AP fails).

With some chipsets (like the one of Pi3 and PiZeroW), AP and client mode can happen simultaneously, but on the same wifi channel (HW limitation). Therefore once Volumio sets Hotspot (AP) mode, typically on channel#4 by default, then if client mode handover to Home Wifi is not properly handled (properly turning OFF AP mode first, or restarting wifi chipset), then client can only join Home Wifi on channel#4!... This is not obvious to users of course, particularly as many Home Wifi are on automatic channel assignment, or unlikely to be set by chance on the same channel than Volumio Hotspot is...

I'm still trying to better characterize the issue and possible workaround in the mentioned Forum thread, but as I do not own those devices, it's quite difficult as I rely on impacted users availability for tests. Hopefully you may be able to carefully test such cases if you own those devices, and particularly check if your new implementation does not get into such troublesome issue.

earlchew commented 7 years ago

@macmpi It would be fairly straightforward to apply some kind of reset strategy prior to bringing up the wifi client, or even prior to bringing up the hotspot. Unfortunately, right now I do not have access to either of the newer RPi models mentioned, so I'm unable to either reproduce or test this failure scenario.

A fix will have to wait until better information is available.

biva commented 7 years ago

Great, I feel we're having a fix soon? If you need to, I'm able to perform tests on RPI3

macmpi commented 7 years ago

What's really critical to avoid former issue or similar, is to make sure Client & AP modes are mutually exclusive, and one is never launched before previous is properly shut-down. This is particularly true in AUTO mode (where each of the 2 modes should actually only follow each other).

Indeed, should Volumio really intend to run Client & AP modes simultaneously, then it should be done by setting-up one virtual network interface for each, which is a bit more complex to handle, and not necessarily properly supported by many wifi chipsets (create_ap to the rescue!): hence very few users would benefit from such rare use-case anyway, and many could complain about it not working...

Therefore, I do not think such simultaneous use is at-all important for Volumio (we are not making a full blown AP), and therefore we just need one standard physical interface BUT then we must also make sure simultaneous modes do never happen by "mistake",...or we may end-up in complex bugs to figure-out, linked to particular chipsets & drivers limitations (like for instance the "same channel limitation" on Pi3/PiZeroW chipset).

I guess your new code does keep those 2 modes mutually exlusives? It seems original code had some mix probably around v2.041 (or bogus driver did not cleanup some context properly), but can't tell exactly what fixed it since...

biva commented 7 years ago

@macmpi wrote:

Client & AP modes are mutually exclusive, and one is never launched before previous is properly shut-down. [...] Therefore, I do not think such simultaneous use is at-all important for Volumio

Totally agree, stability is a way more important than this feature that wouldn't be very useful.

But I think that LAN should be working at any time, and should have the priority over Wifi as soon as it is connected. If I can't connect to my Volumio (2.118 / RPI3) over WiFi, I try to connect over LAN ; but sometimes it doesn't work (I don't see it in my LAN, so I have no choice but restart my RPI). Unfortunately, I wasn't able to reproduce it in a reproducible way.

earlchew commented 7 years ago

@macmpi My proposed reimplementation of wireless.js will only run either the hotspot or the wireless client, but not both at the same time.

macmpi commented 7 years ago

Great thanks. Hope you'll manage factor changes in more incremental ways, if at-all possible.

biva commented 7 years ago

Hello, I'm on 2.163 and I didn't see any change regarding wifi stability in the changelog. Do you still plan to integrate your improvements? I'm available to test: good luck!

earlchew commented 7 years ago

Thanks for the reminder and for your interest. With other matters to take care of, and lack of HW, I haven't put any more time into this recently. I'll find time to make some progress shortly.

biva commented 7 years ago

Great, thanks a lot! (this is quite annoying on RPI 3...)

biva commented 7 years ago

Hello @volumio Is the issue solved? I'm still having issues with wifi on RPI3 with 2.201 And the "wifireconnect" plugin doesn't work on my config (see https://github.com/balbuze/volumio-plugins/issues/64) @earlchew : any news? Thank you!

malcolmjlear commented 7 years ago

This was a big issue with me due to my need for hotspot only (car media player). However the boot time on 2.201 has now significantly improved to 55 seconds on a 2B which is quite acceptable. This issue still shows itself on an older B+ which boots slower at 2 and a half minutes whilst it tries connecting every which way but hotspot. Hopefully this will move forward as earlchew's solution is very neat.

biva commented 7 years ago

Hello @volumio I'm still having stability issues with wifi: version 2.246 on RPI3 Are you planning any improvement? Or include an improved version of wifireconnect plugin? Thank you!

biva commented 7 years ago

@volumio For the record, I dug into this, and I think the problem is that RPI3 (maybe other systems?) needs to see the full path for cron jobs (for example /sbin/ip instead of ip. See https://github.com/balbuze/volumio-plugins/issues/64#issuecomment-327517272

volumio commented 7 years ago

We have planned a rework: if wifi is dropped there will be a setting for it to reconnect automatically.