martin-ger / esp_wifi_repeater

A full functional WiFi Repeater (correctly: a WiFi NAT Router)
MIT License
4.8k stars 904 forks source link

Problem loading google on phone #79

Open gthb96 opened 7 years ago

gthb96 commented 7 years ago

Hello, When I try to make a google search sometimes the loading just gets stuck at the beginning. Sometimes I also get an error from Chrome "err_quic_protocol_error".

Has anyone experienced something similiar? screenshot_20170919-204227

Edit: I just tried disabling "quic" in Chrome's developer settings and that solved the problem. But I still don't know why I have this problem with the repeater in the first hand? I don't have this problem when I connect directly to the root AP instead of the ESP of course. It must have something to do with "quic" or Quick UDP Internet Connections protocol.

martin-ger commented 7 years ago

Interesting - have no experience with QUIC. Andriod Firefox thinks, it is a good idea to flood the AP with hundreds of small empty packets before starting TCP, this sometimes leads to an overflow, but TCP/HTTP recovers from that. Perhaps we have a similar problem here?

Maybe you could try to enable monitoring and get a trace of a successful and a failed conversation. Later this week I might find some time to look into that.

gthb96 commented 7 years ago

I have to try to enable the monitoring and get a wireshark sniff of this. I would like to make a trace for you from successful and failed, maybe we can see whats going on. Thanks for the tip.

gthb96 commented 7 years ago

I have some dumps for you. There are two were it hangs forever and never loads. There are two successful ones where it loads normally. There is one were it hangs for 15 seconds and finishes loading afterwards.

The phone where I tried to load the google page is 192.168.4.3

File link: http://rgho.st/8SJHfzWzL

martin-ger commented 7 years ago

Thank you for sending me the traces. After looking into them and reading a little bit about QUIC I do understand that some QUIC (UDP) connections work even in the unsuccessful traces and others seem to fail. I don't see an obvious reason, but this is hard anyway as nearly every other info than CID and Seq-No is encrypted. So my guess is, it has something to do with UDP NAPTing. UDP NAPTing is tricky anyway as it has no defined end of the protocol interaction. This means: when to delete a NAPT-entry for UDP? The only viable option is a timeout. The current value for this is 2 seconds - after 2 seconds of inactivity the NAPT-mapping is closed, i.e. the "connection" interrupted. This is fine for a quick DNS lookup, but quite short if you build TCP-style connections on top of it. In comparison: open TCP connections have a timeout of 30 min, closed TCP connections still 20 secs.

Of course this is a trade-off: the table of NAPT-entries is limited and long timeouts would occupy a lot of space.

A try to fix the problems would be to increase the 2 seconds to something higher (20 secs?) in line 20 of " esp-open-lwip/include/lwip/lwip_napt.h":

define IP_NAPT_TIMEOUT_MS_UDP (2*1000)

Do you have the development environment to test this or should I provide you with a patched version of the liblwip_napt.a?

gthb96 commented 7 years ago

If you could provide me the patched version it would be nicer than installing the environment. Thanks.