txn2 / txwifi

Raspberry Pi (arm) wifi configuration container. Configure and control wifi connectivity with a JSON based REST api.
https://imti.co/iot-wifi/
MIT License
146 stars 60 forks source link

Rewrite Conversation. #4

Closed LelandSindt closed 5 years ago

LelandSindt commented 5 years ago

I am planning something a re-write of txwifi and would like to make sure that I maintain a reasonably amount of backwards compatibility.

I plan to keep/maintain

I plan to drop (thought I am open to making it available via environment variable flag.)

I plan to add

also considering

Specific to behaviour I plan to attempt to bring wlan0 up. If wlan0 can connect uap0 will be left offline. When wlan0 cannot connect or losses it connection (for a period of time) uap0 will be brought online. If wlan0 is taken/goes back offline uap0 will come back online.

Via the static file handler I plan to implement a web/ui that will allow the user to select from a list of available wifi networks and supply a PSK...

If you have any thoughts, or insights comments are welcome.

LelandSindt commented 5 years ago

After much reading and experimentation... I have found that running hostapd and wpa_supplicant at the same time "works" but is less than stable...

Proposed behaviour...

Start wpa_supplicant
monitor wlan0  
-- if wlan0 state == COMPLETED
---- do nothing
-- if wlan0 state != COMPLETED (for a period of time)
---- stop wpa supplicant/start hostapd and dnsmasq
/connect
accept SSID and PSK, respond "will attempt to connect"
stop hostapd and dnsmasq
start wpa_supplicant, attempt to connect.

note: sudo iwlist uap0 scan will return a list of available APs while uap0/hostapd are running (and wpa_supplicant is not running)

... consider doing away with uap0 -- ?would require changing the interface type on wlan0 between AP and station..?

LelandSindt commented 5 years ago

POC bash script.

#!/bin/bash
set -x
while true
do
  # consider connecting to host_apd, getting all associated clients, and kicking them off.. 
  killall hostapd
  killall dnsmasq
  [ -e /sys/class/net/uap0 ] && iw uap0 del
  wpa_supplicant -c/etc/wpa_supplicant/wpa_supplicant.conf -iwlan0 -Dnl80211,wext &
  set +x                                       
  echo -n "wait."                              
  until [ -e /var/run/wpa_supplicant/wlan0 ]   
  do                                           
    echo -n "."                                
  done                                         
  echo "."                                     
  set -x                                       
  wpa_cli -i wlan0 add_network
  wpa_cli -i wlan0 set_network 0 ssid \"MY_SSID\"
  wpa_cli -i wlan0 set_network 0 psk \"MY_PSK\"
  wpa_cli -i wlan0 enable_network 0
  read -p "press any key to go to host mode"
  killall wpa_supplicant
  [ -e /sys/class/net/uap0 ] && iw uap0 del
  iw phy phy0 interface add uap0 type __ap
  ip link set uap0 address $(cat /sys/class/net/wlan0/address)
  hostapd /etc/hostapd/hostapd.conf &
  ifconfig uap0 192.168.27.1
  ifconfig uap0 netmask 255.255.255.0
  dnsmasq --no-hosts --keep-in-foreground --log-queries --no-resolv --address=/#/192.168.27.1 --dhcp-range=192.168.27.100,192.168.27.150,1h --dhcp-vendorclass=set:device,IoT --dhcp-authoritative --log-facility=- --interface=uap0 --port=0 &
  read -p "press any key to go to station mode"
done
LelandSindt commented 5 years ago

In a situation where the RPi has an active wireless connection (state == COMPLETED) and looses its connection (state != COMPLETED) because the wireless network became unavailable wpa_supplicant would be killed and hostapd/dnsmasq started.

At this point if the wireless network became available again wpa_supplicant is not running/able to attempt a connection... this is not ideal/acceptable.

Potential solution, monitor available networks regularly... if/when the previously connected network returns/becomes available... stop hostapd/dnsmasq, start wpa_supplicant and attempt connection...

draw back.. in a case where the wireless netowrk's PSK is changed wpa_supplicant <--> hostadp/dnsmasq would oscillate. This could be frustrating if you are mid connect/configure when hostadp/dnsmasq goes offline.

potential mitigation, reset timeout/countdown to shutdown hostapd/dnsmasq every time a rest/web endpoint is hit -- or -- if a client connection to hostAPd is detected.

LelandSindt commented 5 years ago

"In my testing, it looks like the RPi’s AP will dynamically change channels to match whatever channel the wlan0 interface is currently using. " -- https://blog.thewalr.us/2017/09/26/raspberry-pi-zero-w-simultaneous-ap-and-managed-mode-wifi/

I saw this "dynamic" channel changing as well, and it would occasionally cause the response form /connect to fail to reach the client when the AP channel changed to match the channel of the network that was being connected to. This is what I was considering "less than stable".

I think that by making ConnectNetwork return nothing, wait ?10 seconds? before applying the new network settings, and run asynchronously. /connect can send back a reasonable "will attempt to connect" response.

If the connection is successful hostapd, dnsmasq and uap0 are taken offline.

This will keep wpa_supplicant online and solve the issues brought up here: https://github.com/txn2/txwifi/issues/4#issuecomment-461991208

soufian044 commented 5 years ago

@LelandSindt i have a question why would you turn off AP when State == COMPLETED ?

LelandSindt commented 5 years ago

@soufian044 -- Thank you for asking.


Over clarification and Making sure we are on the same page.. State == COMPLETED, means that wpa_supplicant has successfully connected to an AP, wlan0 is configured/connected/online.


When State == COMPLETED hostapd and dnsmasq have done their job, are no longer needed, and can be shut down.

Best case scenario having iot-wifi-cfg-3 broadcasting/available is an annoyance.

Worst case scenario having iot-wifi-cfg-3 broadcasting/available is a security hole just waiting to be brute forced or worse exploited because the psk was not changed from the default.

(where iot-wifi-cfg-3 is the SSID of the AP/device in question)

LelandSindt commented 5 years ago

as of last night and this commit, I thought I had everything stable running hostapd, wpa_supplicant and dnsmasq at the same time. However, under further review/testing... everything is not working as expected.

When txwifi shifts over to AP mode here the hostapd brings the AP online, but once wpa_supplicant loads the AP goes offline.

At this point I am going to have to take a step back, think some more, review...

My initial thought is shift to running hostapd/dnsmasq or wpa_supplicant as suggested here

comments/feedback welcome.

LelandSindt commented 5 years ago

As of https://github.com/txn2/txwifi/commit/3a0f2f541fe70d86e681c82802bbdaf6981737e4 I think that the code is viable.

The biggest Todo that needs to be addressed is documentation.

host_apd and wpa_supplicant are mutually exclusive. With no network configured or a configured network that is unavailable txwifi will oscillate between host_apd (AP) and wpa_supplicant (CL) until the configured network brings wpa_supplicant to "COMPLETED" status or a newly configured network allows wpa_supplicant to reach "COMPLETED" status.

note: txwifi expects only one network config in wpa_supplicant.conf and will delete it if/when /connect is called. (read: txwifi only supports 1 wpa_supplicant.conf network config)

topeysoft commented 5 years ago

Hi @LelandSindt, thanks for doing this. I have been using this commit version 3a0f2f5 on my rpi 3 for more than a week now and I can confirm that it's been working as you outlined above. I did notice that the config was not persisted through reboot at first, then I started the docker container with sudo and it works. It connects automatically on boot now and starts the AP mode if unable to connect after a while.

LelandSindt commented 5 years ago

@topeysoft thank you for the feedback and confirmation.

I would expect the following command to persist the wpa_supplicant config and restart the container on reboot (where the wpa supplicant host and container path are most likely /etc/wpa_supplicant/ )

docker run -d --restart always --privileged --net host --name txwifi \
      -v $(pwd)/wificfg.json:/cfg/wificfg.json \
      -v <WPA_SUPPLICANT_HOST_PATH>:<WPA_SUPPLICANT_CONTAINER_PATH> \
      txwifi

Can you elaborate on what you mean by having had to start the container with sudo?

LelandSindt commented 5 years ago

General update: I have not had and will not have the time to update the documentation, push a new image, and merge to master... It is likely going to be another week or two before everything is complete..

However, if you feel so inclined... please pull/checkout the develop branch, build the image locally and give it a spin. (the installation/preparation documentation is still valid... https://github.com/txn2/txwifi/tree/develop#getting-started )

topeysoft commented 5 years ago

Yes. I started it with something similar to that docker run -d --restart always --net host --name txwifi \ -v $(pwd)/wificfg.json:/cfg/wificfg.json ... but in a bash file. I ran the bash using sudo the second time around. I might have omitted the --privileged flag in the docker command and it's also possible that there was something else happening with my rpi setup.

adsb-related-code commented 5 years ago

Cool project. I've been looking at building something like this for http://adsbexchange.com/ when users configuring feeder units.

AP with web interface they can setup the device, wifi, and other settings./

Then have it connect to their WiFi. If WiFi fails, then back to AP and periodically check configured WiFi.

beres commented 5 years ago

Hi @LelandSindt it works, but don't you think that switch to AP mode if the device can not connect for a while could be also security problem, probably you could use this behavior if a sort of reset mode is disabled otherwise a specific url must be called to reset the state. What did you think?

LelandSindt commented 5 years ago

I will think about a dontFallBackToApMode flag.

(Forgive the brevity I am working from my phone)

cjimti commented 5 years ago

This is something I will likely need as well. Unfortunately, there is no easy answer to this since there are negatives on both sides. Loose network connectivity or move the device to a new location and have no way of re-connecting it, or end up accidentally exposing a security problem. Most consumer IOT devices get around this by using a physical reset button.

LelandSindt commented 5 years ago

@beres I see your point @cjimti well said

For perspective: My experience with the Google home/chrome devices is that they will drop back to AP when they loose network connection.

The environment variable mode flag will be reasonably easy to implement, as for it's default I am open to suggestions/debate.

It would also be fairly easy to implement a gpio based trigger/switch.

That said... I am signing off for the next week or so... I will check in when I return. 🛳️

LelandSindt commented 5 years ago

One last thought (start of the debate?)

When in doubt, default towards secure. Read: don't fall back to AP mode by default.

kadaan commented 5 years ago

I think it should be configurable. While I prefer the button, it might be a barrier to entry.

kadaan commented 5 years ago

BTW, I tried the developer branch last night after first using the published docket container (which worked great). This was giving me issues. The configuration WiFi network was correctly started, but any time i connected the WiFi would recycle.

kadaan commented 5 years ago

Here are some of the logs from the container:

{"hostname":"cambium","level":30,"msg":"Hostapd ENABLED","name":"txwifi","pid":0,"time":"2019-03-11T15:31:24.703Z","v":0}
{"hostname":"cambium","level":30,"msg":"HOSTAPD GOT: ctrl_iface not configured!","name":"txwifi","pid":0,"time":"2019-03-11T15:31:24.703Z","v":0}
{"hostname":"cambium","level":20,"msg":"ProcessCmd got wpa_supplicant","name":"txwifi","pid":0,"time":"2019-03-11T15:31:34.704Z","v":0}
panic: interface conversion: interface {} is *exec.ExitError, not string

goroutine 19 [running]:
github.com/txn2/txwifi/vendor/github.com/bhoriuchi/go-bunyan/bunyan.(*bunyanLog).sprintf(0x10b34d2c, 0x10b0cfb0, 0x1, 0x1, 0x10b73540, 0x0)
    /go/src/github.com/txn2/txwifi/vendor/github.com/bhoriuchi/go-bunyan/bunyan/log.go:28 +0x98
github.com/txn2/txwifi/vendor/github.com/bhoriuchi/go-bunyan/bunyan.(*bunyanLog).write(0x10b34d2c, 0x2a1bc0, 0x6, 0x2a184f, 0x5, 0x2a1bde, 0x6, 0x2db470, 0x10b660e8, 0x0, ...)
    /go/src/github.com/txn2/txwifi/vendor/github.com/bhoriuchi/go-bunyan/bunyan/log.go:80 +0xc94
github.com/txn2/txwifi/vendor/github.com/bhoriuchi/go-bunyan/bunyan.(*Logger).Fatal(0x10b9d6e0, 0x10b0cfb0, 0x1, 0x1)
    /go/src/github.com/txn2/txwifi/vendor/github.com/bhoriuchi/go-bunyan/bunyan/logger.go:126 +0x114
github.com/txn2/txwifi/iotwifi.(*WpaCfg).ScanNetworks(0x10b9d6e0, 0x1, 0x2a184f, 0x5)
    /go/src/github.com/txn2/txwifi/iotwifi/wpacfg.go:258 +0xe4
github.com/txn2/txwifi/iotwifi.RunWifi(0x2a1bde, 0x6, 0x2a184f, 0x5, 0x2db470, 0x10b660e8, 0x0, 0x0, 0x0, 0x0, ...)
    /go/src/github.com/txn2/txwifi/iotwifi/iotwifi.go:117 +0x250
created by main.main
    /go/src/github.com/txn2/txwifi/main.go:45 +0x230
{"hostname":"cambium","level":30,"msg":"Starting IoT Wifi...","name":"txwifi","pid":0,"time":"2019-03-11T15:31:41.721Z","v":0}
LelandSindt commented 5 years ago

@kadaan the logs you posted here look to me they came from the master branch...

If you are still having trouble/interested in troubleshooting please double check that you are working/building from the develop branch capture the logs and open a new issue we can work from.

--Thanks.

kadaan commented 5 years ago

Got it working. Thanks!

BTW, it would be great to setup the config so that the txwifi config page was shown as a captive portal. That way you would automatically be directed to the configuration page.

arsoba commented 5 years ago

Hi @LelandSindt . I'm really interested in your project. If you have any plans for improvement, I can help you with it.

LelandSindt commented 5 years ago

I am going to close this "issue"... I have completed the bulk of the work, but a review and tweaking of the readme (and the new docker image published) would be required to push the develop branch to master...

These are things that I simply don't have time for this summer. Perhaps things will change in the fall.

In the mean time, I am confident in the code contained in the development branch.. building the image locally should work...

reach out if you have questions, I will do what I can do answer.

cjimti commented 5 years ago

@LelandSindt you have done a lot of really great work on this. I have quite a few open source projects I have had to put on hold myself.