soypat / cyw43439

Driver for the Wifi+bluetooth integrated circuit on the pico.
MIT License
119 stars 6 forks source link

NTP example always times out while attempting to resolve DNS queries #37

Closed 0b1-k closed 9 months ago

0b1-k commented 9 months ago

As I'm testing the driver using the provided examples, I'm always running into timeout issues with the NTP sample. ` $ tinygo flash -target=pico -stack-size=8kb -monitor ./examples/ntp go: downloading github.com/soypat/seqs v0.0.0-20240109050728-0ac18cb38dd9 go: downloading github.com/tinygo-org/pio v0.0.0-20231216154340-cd888eb58899 go: downloading golang.org/x/exp v0.0.0-20230728194245-b0cb94b80691 Connected to /dev/ttyACM0. Press Ctrl-C to exit. time=1970-01-01T00:00:01.632Z level=INFO msg="joining WPA secure network" ssid=Shroom passlen=23 time=1970-01-01T00:00:05.078Z level=INFO msg="wifi join success!" mac=28:cd:c1:0e:48:e7 time=1970-01-01T00:00:05.080Z level=INFO msg="DHCP ongoing..." time=1970-01-01T00:00:05.081Z level=INFO msg=DHCP:tx msg=Discover time=1970-01-01T00:00:05.082Z level=DEBUG msg=UDP:send plen=324 time=1970-01-01T00:00:05.132Z level=DEBUG msg=UDP:recv plen=301 time=1970-01-01T00:00:05.132Z level=DEBUG msg=DHCP:rx opt=MessageType data=2 time=1970-01-01T00:00:05.133Z level=DEBUG msg=DHCP:rx opt=ServerIdentification data=192,168,0,1 time=1970-01-01T00:00:05.134Z level=DEBUG msg=DHCP:rx opt=IPAddressLeaseTime data=0,9,58,128 time=1970-01-01T00:00:05.134Z level=DEBUG msg=DHCP:rx opt=RenewTimeValue data=0,4,157,64 time=1970-01-01T00:00:05.134Z level=DEBUG msg=DHCP:rx opt=RebindingTimeValue data=0,8,19,48 time=1970-01-01T00:00:05.135Z level=DEBUG msg=DHCP:rx opt=SubnetMask data=255,255,255,0 time=1970-01-01T00:00:05.135Z level=DEBUG msg=DHCP:rx opt=BroadcastAddress data=192,168,0,255 time=1970-01-01T00:00:05.136Z level=DEBUG msg=DHCP:rx opt=Router data=192,168,0,1 time=1970-01-01T00:00:05.136Z level=DEBUG msg=DHCP:rx opt=DNSServers data=192,168,0,1 time=1970-01-01T00:00:05.137Z level=DEBUG msg=DHCP:rx opt=DomainName data=104,116,46,104,111,109,101 time=1970-01-01T00:00:05.137Z level=INFO msg=DHCP:rx msg=Offer time=1970-01-01T00:00:05.138Z level=INFO msg=DHCP:tx msg=Request time=1970-01-01T00:00:05.139Z level=DEBUG msg=UDP:send plen=308 time=1970-01-01T00:00:05.183Z level=DEBUG msg=UDP:recv plen=311 time=1970-01-01T00:00:05.183Z level=DEBUG msg=DHCP:rx opt=MessageType data=5 time=1970-01-01T00:00:05.184Z level=DEBUG msg=DHCP:rx opt=ServerIdentification data=192,168,0,1 time=1970-01-01T00:00:05.184Z level=DEBUG msg=DHCP:rx opt=IPAddressLeaseTime data=0,9,58,128 time=1970-01-01T00:00:05.185Z level=DEBUG msg=DHCP:rx opt=RenewTimeValue data=0,4,157,64 time=1970-01-01T00:00:05.185Z level=DEBUG msg=DHCP:rx opt=RebindingTimeValue data=0,8,19,48 time=1970-01-01T00:00:05.186Z level=DEBUG msg=DHCP:rx opt=SubnetMask data=255,255,255,0 time=1970-01-01T00:00:05.186Z level=DEBUG msg=DHCP:rx opt=BroadcastAddress data=192,168,0,255 time=1970-01-01T00:00:05.187Z level=DEBUG msg=DHCP:rx opt=Router data=192,168,0,1 time=1970-01-01T00:00:05.187Z level=DEBUG msg=DHCP:rx opt=DNSServers data=192,168,0,1 time=1970-01-01T00:00:05.188Z level=DEBUG msg=DHCP:rx opt=DomainName data=104,116,46,104,111,109,101 time=1970-01-01T00:00:05.188Z level=DEBUG msg=DHCP:rx opt=HostName data=110,116,112,45,112,105,99,111 time=1970-01-01T00:00:05.189Z level=INFO msg=DHCP:rx msg=Ack time=1970-01-01T00:00:05.581Z level=INFO msg="DHCP complete" cidrbits=24 ourIP=192.168.0.78 dns=192.168.0.1 broadcast=192.168.0.255 gateway="invalid IP" router=192.168.0.1 dhcp=192.168.0.1 hostname=ntp-pico lease=168h0m0s renewal=84h0m0s rebinding=147h0m0s time=1970-01-01T00:00:05.591Z level=DEBUG msg=ARP:send isReply=false time=1970-01-01T00:00:05.641Z level=DEBUG msg=ARP:recv op=2 time=1970-01-01T00:00:05.692Z level=DEBUG msg=ARP:send isReply=false time=1970-01-01T00:00:05.743Z level=DEBUG msg=ARP:recv op=2 time=1970-01-01T00:00:05.795Z level=DEBUG msg=UDP:send plen=72 panic: DNS lookup failed:dns lookup timed out

`

soypat commented 9 months ago

Hey @fabienroyer! Looks like the DNS server is not responding to your request. That's odd since it should at least reply with an error message. A few questions:

FYI: this is what a succesful NTP should look like, click to show: ``` time=1970-01-01T00:00:01.632Z level=INFO msg="joining WPA secure network" ssid="WHITTINGSLOW 2.4" passlen=10 time=1970-01-01T00:00:05.077Z level=INFO msg="wifi join success!" mac=28:cd:c1:05:4d:bb time=1970-01-01T00:00:05.079Z level=INFO msg="DHCP ongoing..." time=1970-01-01T00:00:05.081Z level=INFO msg=DHCP:tx msg=Discover time=1970-01-01T00:00:05.082Z level=DEBUG msg=UDP:send plen=324 time=1970-01-01T00:00:05.580Z level=INFO msg="DHCP ongoing..." time=1970-01-01T00:00:06.080Z level=INFO msg="DHCP ongoing..." time=1970-01-01T00:00:06.580Z level=INFO msg="DHCP ongoing..." time=1970-01-01T00:00:07.080Z level=INFO msg="DHCP ongoing..." time=1970-01-01T00:00:07.580Z level=INFO msg="DHCP ongoing..." time=1970-01-01T00:00:08.080Z level=INFO msg="DHCP ongoing..." time=1970-01-01T00:00:08.395Z level=DEBUG msg=UDP:recv plen=309 time=1970-01-01T00:00:08.396Z level=DEBUG msg=DHCP:rx opt=MessageType data=2 time=1970-01-01T00:00:08.396Z level=DEBUG msg=DHCP:rx opt=ServerIdentification data=192,168,0,1 time=1970-01-01T00:00:08.397Z level=DEBUG msg=DHCP:rx opt=IPAddressLeaseTime data=0,0,14,16 time=1970-01-01T00:00:08.397Z level=DEBUG msg=DHCP:rx opt=RenewTimeValue data=0,0,7,8 time=1970-01-01T00:00:08.398Z level=DEBUG msg=DHCP:rx opt=RebindingTimeValue data=0,0,12,78 time=1970-01-01T00:00:08.398Z level=DEBUG msg=DHCP:rx opt=SubnetMask data=255,255,255,0 time=1970-01-01T00:00:08.399Z level=DEBUG msg=DHCP:rx opt=BroadcastAddress data=192,168,0,255 time=1970-01-01T00:00:08.399Z level=DEBUG msg=DHCP:rx opt=Router data=192,168,0,1 time=1970-01-01T00:00:08.400Z level=DEBUG msg=DHCP:rx opt=DNSServers data=192,168,0,1 time=1970-01-01T00:00:08.400Z level=DEBUG msg=DHCP:rx opt=DomainName data=102,105,98,101,114,116,101,108,46,99,111,109,46,97,114 time=1970-01-01T00:00:08.401Z level=INFO msg=DHCP:rx msg=Offer time=1970-01-01T00:00:08.402Z level=INFO msg=DHCP:tx msg=Request time=1970-01-01T00:00:08.402Z level=DEBUG msg=UDP:send plen=308 time=1970-01-01T00:00:08.403Z level=ERROR msg=Stack.RecvEth err="unknown IP protocol" poll error: unknown IP protocol time=1970-01-01T00:00:08.446Z level=DEBUG msg=UDP:recv plen=319 time=1970-01-01T00:00:08.447Z level=DEBUG msg=DHCP:rx opt=MessageType data=5 time=1970-01-01T00:00:08.447Z level=DEBUG msg=DHCP:rx opt=ServerIdentification data=192,168,0,1 time=1970-01-01T00:00:08.448Z level=DEBUG msg=DHCP:rx opt=IPAddressLeaseTime data=0,0,14,16 time=1970-01-01T00:00:08.448Z level=DEBUG msg=DHCP:rx opt=RenewTimeValue data=0,0,7,8 time=1970-01-01T00:00:08.449Z level=DEBUG msg=DHCP:rx opt=RebindingTimeValue data=0,0,12,78 time=1970-01-01T00:00:08.449Z level=DEBUG msg=DHCP:rx opt=SubnetMask data=255,255,255,0 time=1970-01-01T00:00:08.450Z level=DEBUG msg=DHCP:rx opt=BroadcastAddress data=192,168,0,255 time=1970-01-01T00:00:08.450Z level=DEBUG msg=DHCP:rx opt=Router data=192,168,0,1 time=1970-01-01T00:00:08.451Z level=DEBUG msg=DHCP:rx opt=DNSServers data=192,168,0,1 time=1970-01-01T00:00:08.451Z level=DEBUG msg=DHCP:rx opt=DomainName data=102,105,98,101,114,116,101,108,46,99,111,109,46,97,114 time=1970-01-01T00:00:08.452Z level=DEBUG msg=DHCP:rx opt=HostName data=110,116,112,45,112,105,99,111 time=1970-01-01T00:00:08.452Z level=INFO msg=DHCP:rx msg=Ack time=1970-01-01T00:00:08.580Z level=INFO msg="DHCP complete" cidrbits=24 ourIP=192.168.0.68 dns=192.168.0.1 broadcast=192.168.0.255 gateway="invalid IP" router=192.168.0.1 dhcp=192.168.0.1 hostname=ntp-pico lease=1h0m0s renewal=30m0s rebinding=52m30s time=1970-01-01T00:00:08.599Z level=DEBUG msg=ARP:send isReply=false time=1970-01-01T00:00:08.650Z level=DEBUG msg=ARP:recv op=2 time=1970-01-01T00:00:08.701Z level=DEBUG msg=ARP:send isReply=false time=1970-01-01T00:00:08.752Z level=DEBUG msg=ARP:recv op=2 time=1970-01-01T00:00:08.803Z level=DEBUG msg=UDP:send plen=72 time=1970-01-01T00:00:08.854Z level=DEBUG msg=UDP:recv plen=94 time=1970-01-01T00:00:08.855Z level=INFO msg=dns:recv op=Query rcode=Success time=1970-01-01T00:00:08.855Z level=INFO msg=dns:incomplete err="too many Answers" time=1970-01-01T00:00:08.906Z level=INFO msg=ntp:send xmt=1900-01-01T00:00:00.000Z time=1970-01-01T00:00:08.907Z level=DEBUG msg=UDP:send plen=90 time=1970-01-01T00:00:08.956Z level=DEBUG msg=UDP:recv plen=48 time=1970-01-01T00:00:08.957Z level=INFO msg=ntp:recv origin=1900-01-01T00:00:00.000Z recv=2024-01-13T16:12:20.210Z transmit=2024-01-13T16:12:20.210Z t4=1900-01-01T00:00:00.051Z still ntping ntp done newtime=2024-01-13 16:12:20.184524131 +0000 UTC ```
0b1-k commented 9 months ago

Are you able to capture wireshark packets on your network and share them here?

Not really easily at the moment. I'm on a switched network and I only see broadcast / multicast packets from other machines.

What access point model+ network architecture are you running?

I'm using Ubiquiti Networks access points (UAP-AC-Pro and Lite). The access points are all connected to a 1Gbps switched Ethernet network, which is itself connected to the Internet via a consumer router.

Are you able to try with other routers/modems/access points and see if the problem persists?

Potentially, using random public networks.

Does your network have access to internet?

Yes.

cmol commented 9 months ago

If your DHCP is not working, any old AP on the network should fail. The best solution here for a packet dump would be to place something in between the AP and your DHCP server (perhaps your router) for packet dumping both a known good client and the Pico that is failing.

If you have an old raspberry Pi and a USB Ethernet adapter, that can do the trick. I have a guide to it somewhere, I'll find it later.

Getting those data points to us really helps debugging this scenario!

soypat commented 9 months ago

I think it may be an issue with DNS rather than DHCP- the logs you've shared show DHCP completed correctly.

It sounds like it may be a routing issue- I've heard DNS is quite tricky. If you have access to the ubiquiti admin panel I imagine there should be some trace of DNS traffic to be able to debug further.

Another way to debug this would be to capture packets like Claus says. If we have a capture of the DNS traffic (even if it is only the outgoing packet) we could show it to people who know much more about networking than me that may be able to tell if something is off.

cmol commented 9 months ago

I'm just gonna dump the packet dumping guide I wrote on slack here, but generally, if you can get me packet dumps of DHCP and DNS, I can look at it and see what might be wrong. I have done a fair bit of DNS work so if it's DNS it shouldn't be too bad.

Guide below

Just a quick one on how I did my network sniffing. I found an old PI 3 B+. I set it up with the Raspbian light version, and got a USB ethernet adapter. Enabled systemd-networkd with: sudo systemctl enable systemd-networkd Created this file with the content to create the bridge: sudo nano /etc/systemd/network/bridge-br0.netdev

[NetDev]
Name=br0
Kind=bridge

Created member files for the bridge members: sudo nano /etc/systemd/network/br0-member-eth0.network

[Match]
Name=eth0

[Network]
Bridge=br0

Same file as above for eth1 (remember the change in the file itself, not just the file name)

Exclude the two interfaces from dhcpcd sudo nano /etc/dhcpcd.conf

denyinterfaces eth0 eth1
interface br0

Reboot, and then you can find your pi on the network. Then you: Install tcpdump: sudo apt install tcpdump Dump DHCP with: sudo tcpdump -i eth0 port 67 or port 68 -s 65535 -w dhcp_good.cap Copy over file to your own machine and open with wireshark Hope this is useful. You might already know all this, but just in case, it's here now


Dump instructions above are for DHCP, for DNS you use something like sudo tcpdump -i eth0 port53 -s 65535 -w dhcp_good.cap

0b1-k commented 9 months ago

@cmol @soypat

Thanks for the follow up and write-up. It is similar to what I was planning on doing, which is building an "Ethernet bridge device" that I can splice into the network right before the router going to the Internet to get the trace you need. DHCP is working fine. It's only DNS requests that appear to time out.

0b1-k commented 9 months ago

@cmol @soypat

Here's a packet capture showing a couple DNS requests originating from the Pico W while running the ./examples/ntp sample.

dns.zip

Please let me know if you need any additional information.

cmol commented 9 months ago

@cmol @soypat

Here's a packet capture showing a couple DNS requests originating from the Pico W while running the ./examples/ntp sample.

dns.zip

Please let me know if you need any additional information.

Ok, two things stands out to me in the header flags:

  1. The RD bit is not set (recursion). I don't think you always need this one, but that could be a source of errors.
  2. The RA bit is set. I am pretty sure we need to have this as 0, as it's only a response bit. My guess is that the DNS server rejects this as it's technically not valid in a request (and the DNS server can be picky).

My guess is that this is what's causing the issue. The rest looks good, though we could add the eDNS additional query for this stuff.

I don't have a ton of time this week, but that's a good start.

@soypat what are we using for DNS? If we need a library, I happened to have made one https://github.com/cmol/dns

I created it to make custom DNS servers but I think I can make some simple changes to make it work nicely as a client library as well.

cmol commented 9 months ago

Again, don't have time to test tonight, but I'm pretty sure it's just this in the seqs repo:

diff --git a/eth/dns/dns.go b/eth/dns/dns.go
index 9461a6c..56e3804 100644
--- a/eth/dns/dns.go
+++ b/eth/dns/dns.go
@@ -89,7 +89,7 @@ func DecodeHeader(b []byte) (dhdr Header) {

 // NewClientHeaderFlags creates the header flags for a client request.
 func NewClientHeaderFlags(op OpCode, enableRecursion bool) HeaderFlags {
-       return HeaderFlags(op&0b1111)<<11 | HeaderFlags(b2u8(enableRecursion))<<7
+       return HeaderFlags(op&0b1111)<<11 | HeaderFlags(b2u8(enableRecursion))<<8
 }
soypat commented 9 months ago

@cmol Yup, that looks like one of/the the issue(s). Let me know if you want to submit a PR for it, if not I'll get around to it sometime tommorrow

cmol commented 9 months ago

@cmol Yup, that looks like one of the issue, if not the issue. Let me know if you want to submit a PR for it, if not I'll get around to it sometime tommorrow

@soypat I realistically won't have time until perhaps Wednesday or Friday, so you're welcome to have a go at it before then! If not, I'll look at it during the week.

soypat commented 9 months ago

no worries- will push it now and update CYW43439 asap

0b1-k commented 9 months ago

@soypat

DNS queries are now succeeding with your fix :)

tinygo flash -target=pico -stack-size=8kb -monitor ./examples/ntp go: downloading github.com/soypat/seqs v0.0.0-20240116042257-a699b4ea0e64 Connected to /dev/ttyACM0. Press Ctrl-C to exit. time=1970-01-01T00:00:01.631Z level=INFO msg="joining WPA secure network" ssid=Shroom passlen=23 time=1970-01-01T00:00:05.078Z level=INFO msg="wifi join success!" mac=28:cd:c1:0e:48:e7 time=1970-01-01T00:00:05.080Z level=INFO msg="DHCP ongoing..." time=1970-01-01T00:00:05.081Z level=INFO msg=DHCP:tx msg=Discover time=1970-01-01T00:00:05.082Z level=DEBUG msg=UDP:send plen=324 time=1970-01-01T00:00:05.132Z level=DEBUG msg=UDP:recv plen=301 time=1970-01-01T00:00:05.132Z level=DEBUG msg=DHCP:rx opt=MessageType data=2 time=1970-01-01T00:00:05.133Z level=DEBUG msg=DHCP:rx opt=ServerIdentification data=192,168,0,1 time=1970-01-01T00:00:05.133Z level=DEBUG msg=DHCP:rx opt=IPAddressLeaseTime data=0,9,58,128 time=1970-01-01T00:00:05.134Z level=DEBUG msg=DHCP:rx opt=RenewTimeValue data=0,4,157,64 time=1970-01-01T00:00:05.134Z level=DEBUG msg=DHCP:rx opt=RebindingTimeValue data=0,8,19,48 time=1970-01-01T00:00:05.135Z level=DEBUG msg=DHCP:rx opt=SubnetMask data=255,255,255,0 time=1970-01-01T00:00:05.135Z level=DEBUG msg=DHCP:rx opt=BroadcastAddress data=192,168,0,255 time=1970-01-01T00:00:05.136Z level=DEBUG msg=DHCP:rx opt=Router data=192,168,0,1 time=1970-01-01T00:00:05.136Z level=DEBUG msg=DHCP:rx opt=DNSServers data=192,168,0,1 time=1970-01-01T00:00:05.137Z level=DEBUG msg=DHCP:rx opt=DomainName data=104,116,46,104,111,109,101 time=1970-01-01T00:00:05.137Z level=INFO msg=DHCP:rx msg=Offer time=1970-01-01T00:00:05.138Z level=INFO msg=DHCP:tx msg=Request time=1970-01-01T00:00:05.139Z level=DEBUG msg=UDP:send plen=308 time=1970-01-01T00:00:05.183Z level=DEBUG msg=UDP:recv plen=311 time=1970-01-01T00:00:05.183Z level=DEBUG msg=DHCP:rx opt=MessageType data=5 time=1970-01-01T00:00:05.184Z level=DEBUG msg=DHCP:rx opt=ServerIdentification data=192,168,0,1 time=1970-01-01T00:00:05.184Z level=DEBUG msg=DHCP:rx opt=IPAddressLeaseTime data=0,9,58,128 time=1970-01-01T00:00:05.185Z level=DEBUG msg=DHCP:rx opt=RenewTimeValue data=0,4,157,64 time=1970-01-01T00:00:05.185Z level=DEBUG msg=DHCP:rx opt=RebindingTimeValue data=0,8,19,48 time=1970-01-01T00:00:05.186Z level=DEBUG msg=DHCP:rx opt=SubnetMask data=255,255,255,0 time=1970-01-01T00:00:05.186Z level=DEBUG msg=DHCP:rx opt=BroadcastAddress data=192,168,0,255 time=1970-01-01T00:00:05.187Z level=DEBUG msg=DHCP:rx opt=Router data=192,168,0,1 time=1970-01-01T00:00:05.187Z level=DEBUG msg=DHCP:rx opt=DNSServers data=192,168,0,1 time=1970-01-01T00:00:05.188Z level=DEBUG msg=DHCP:rx opt=DomainName data=104,116,46,104,111,109,101 time=1970-01-01T00:00:05.188Z level=DEBUG msg=DHCP:rx opt=HostName data=110,116,112,45,112,105,99,111 time=1970-01-01T00:00:05.189Z level=INFO msg=DHCP:rx msg=Ack time=1970-01-01T00:00:05.581Z level=INFO msg="DHCP complete" cidrbits=24 ourIP=192.168.0.78 dns=192.168.0.1 broadcast=192.168.0.255 gateway="invalid IP" router=192.168.0.1 dhcp=192.168.0.1 hostname=ntp-pico lease=168h0m0s renewal=84h0m0s rebinding=147h0m0s time=1970-01-01T00:00:05.590Z level=DEBUG msg=ARP:send isReply=false time=1970-01-01T00:00:05.641Z level=DEBUG msg=ARP:recv op=2 time=1970-01-01T00:00:05.692Z level=DEBUG msg=ARP:send isReply=false time=1970-01-01T00:00:05.743Z level=DEBUG msg=ARP:recv op=2 time=1970-01-01T00:00:05.795Z level=DEBUG msg=UDP:send plen=72 time=1970-01-01T00:00:05.845Z level=DEBUG msg=UDP:recv plen=94 time=1970-01-01T00:00:05.846Z level=INFO msg=dns:recv op=Query rcode=Success time=1970-01-01T00:00:05.847Z level=INFO msg=dns:incomplete err="too many Answers" time=1970-01-01T00:00:05.897Z level=INFO msg=ntp:send xmt=1900-01-01T00:00:00.000Z time=1970-01-01T00:00:05.898Z level=DEBUG msg=UDP:send plen=90 time=1970-01-01T00:00:05.998Z level=DEBUG msg=UDP:recv plen=48 time=1970-01-01T00:00:05.999Z level=INFO msg=ntp:recv origin=1900-01-01T00:00:00.000Z recv=2024-01-16T18:56:09.286Z transmit=2024-01-16T18:56:09.286Z t4=1900-01-01T00:00:00.101Z still ntping ntp done newtime=2024-01-16 18:56:09.235123337 +0000 UTC

soypat commented 9 months ago

Awesome! Thank you for all your efforts @cmol and @fabienroyer for filing the issue and aiding with the information! I'm closing this now as it is fixed!

cmol commented 9 months ago

Awesome! Thank you for all your efforts @cmol and @fabienroyer for filing the issue and aiding with the information! I'm closing this now as it is fixed!

Any time :)

0b1-k commented 9 months ago

@cmol @soypat

You are welcome and thank you both again for your awesome work :)