warmcat / libwebsockets

canonical libwebsockets.org networking library
https://libwebsockets.org
Other
4.74k stars 1.48k forks source link

lws_client_connect_via_info blocked #2453

Closed calvin2021y closed 2 years ago

calvin2021y commented 2 years ago

On my test lws_client_connect_via_info take few secends to finish some time.

Is there some call inside this function will block ( non asynchronous oprate) ?

some time it take 137ms, some time 12048ms.

This block my event loop, how to avoid this block ?

lws-team commented 2 years ago

Yes, by default lws uses getaddrinfo() on most platforms, it is blocking.

You can build lws with cmake -DLWS_WITH_SYS_ASYNC_DNS=1 and lws will use its own DNS client, which uses the event loop and has no blocking.

calvin2021y commented 2 years ago

After build with DLWS_WITH_SYS_ASYNC_DNS, master branch, on windows:

Process 5936 stopped
* thread #1, stop reason = Exception 0xc0000005 encountered at address 0x1188a38: Access violation reading location 0x61642e83
    frame #0: 0x01188a38 test.exe`lws_sort_dns(wsi=<unavailable>, result=<unavailable>) at sort-dns.c:625:27
Process 5936 launched: 'C:\test.exe' (i686)
(lldb) bt
* thread #1, stop reason = Exception 0xc0000005 encountered at address 0x1188a38: Access violation reading location 0x61642e83
  * frame #0: 0x01188a38 test.exe`lws_sort_dns(wsi=<unavailable>, result=<unavailable>) at sort-dns.c:625:27
    frame #1: 0x0111c9db test.exe`lws_client_connect_3_connect(wsi=<unavailable>, ads=<unavailable>, result=0x0763a080, n=<unavailable>, opaque=<unavailable>) at connect3.c:175:3
    frame #2: 0x0106eb93 test.exe`lws_async_dns_complete(q=<unavailable>, c=<unavailable>) at async-dns.c:103:7
    frame #3: 0x01117641 test.exe`lws_adns_parse_udp(dns=<unavailable>, pkt=<unavailable>, len=<unavailable>) at async-dns-parse.c:691:2
    frame #4: 0x0106ece4 test.exe`callback_async_dns(wsi=<unavailable>, reason=<unavailable>, user=<unavailable>, in=<unavailable>, len=<unavailable>) at async-dns.c:306:3
    frame #5: 0x00ee64d0 test.exe`user_callback_handle_rxflow(callback_function=(test.exe`callback_async_dns at async-dns.c:288), wsi=0x07603880, reason=LWS_CALLBACK_RAW_RX, user=0x00000000, in=0x071006a8, len=130) at wsi.c:498:6
    frame #6: 0x0107ce84 test.exe`rops_handle_POLLIN_raw_skt(pt=<unavailable>, wsi=<unavailable>, pollfd=<unavailable>) at ops-raw-skt.c:149:8
    frame #7: 0x0107f005 test.exe`lws_service_fd_tsi(context=<unavailable>, pollfd=<unavailable>, tsi=<unavailable>) at service.c:762:10
    frame #8: 0x00ee97c9 test.exe`lws_io_cb(watcher=<unavailable>, status=<unavailable>, revents=<unavailable>) at libuv.c:147:2
    frame #9: 0x00ed1aae test.exe`uv_process_poll_req at poll.c:188:7
    frame #10: 0x00ed1a18 test.exe`uv_process_poll_req(loop=<unavailable>, handle=<unavailable>, req=<unavailable>) at poll.c:529:5
    frame #11: 0x00ec5224 test.exe`uv_run [inlined] uv_process_reqs at req-inl.h:202:9
    frame #12: 0x00ec50f2 test.exe`uv_run(loop=<unavailable>, mode=<unavailable>) at core.c:609:19
lws-team commented 2 years ago

DNS sorting isn't going to do anything useful on Windows since it doesn't have netlink. Does skipping it like this help?

diff --git a/lib/core-net/client/connect3.c b/lib/core-net/client/connect3.c
index 4b1f0c4ba1..168ed83d21 100644
--- a/lib/core-net/client/connect3.c
+++ b/lib/core-net/client/connect3.c
@@ -172,7 +172,9 @@ lws_client_connect_3_connect(struct lws *wsi, const char *ads,
                lws_conmon_append_copy_new_dns_results(wsi, result);
 #endif

+#if !defined(WIN32)
                lws_sort_dns(wsi, result);
+#endif
 #if defined(LWS_WITH_SYS_ASYNC_DNS)
                lws_async_dns_freeaddrinfo(&result);
 #else
lws-team commented 2 years ago

Sorry I don't think it can be skipped like the previous comment suggested.

Can you build with DEBUG mode so we can get a better backtrace, and let me know the exact commit of main if that's what it is (master is gone now).

calvin2021y commented 2 years ago

commit 43c4b799

lws-team commented 2 years ago

dns-sort.c is unchanged since then...

The api test for this is tested on windows 10 in CI successfully

https://libwebsockets.org/sai/index.html?task=49e9248c7f19ef9dc64e78e94353d812a5f3feebd07ba35b815bd2a40309f930

... can you add -DLWS_WITH_MINIMAL_EXAMPLES=1 to cmake and rebuild, somwhere in your build dir there should be lws-api-test-async-dns built, does it run successfully?

calvin2021y commented 2 years ago

It not always break , I can not catch the bug any more.

Seems some network not stable trigger the error.

Maybe ai == NULL from lws_sort_dns ?

With debug build I get the same backtrace. I use lldb on windows.

lws-team commented 2 years ago

It complains

Exception 0xc0000005 encountered at address 0x1188a38: Access violation reading location 0x61642e83

so I don't think it's NULL... seems at least one of the ai list members is not preparted, or trashed somehow. But the backtrace and the RELEASE build make it hard to guess.

calvin2021y commented 2 years ago

Thanks for explain, I try build the debug mode and run a lot tests, but not able to trigger the bugs again.

I will do more test tomorrow.

calvin2021y commented 2 years ago

connect to https://c.msn.com/ trigger this bugs.

This is the debug build trace.

(lldb) run
Process 4948 stopped
* thread #1, stop reason = Exception 0xc0000005 encountered at address 0x9b975b: Access violation reading location 0x2e63bece
    frame #0: 0x009b975b test.exe`lws_sort_dns(wsi=<unavailable>, result=<unavailable>) at sort-dns.c:625:27
Process 4948 launched: 'C:\Users\dev\test.exe' (i686)
(lldb) bt
* thread #1, stop reason = Exception 0xc0000005 encountered at address 0x9b975b: Access violation reading location 0x2e63bece
  * frame #0: 0x009b975b test.exe`lws_sort_dns(wsi=<unavailable>, result=<unavailable>) at sort-dns.c:625:27
    frame #1: 0x0094d6bb test.exe`lws_client_connect_3_connect(wsi=<unavailable>, ads=<unavailable>, result=0x07541880, n=<unavailable>, opaque=<unavailable>) at connect3.c:175:3
    frame #2: 0x008a06a1 test.exe`lws_async_dns_query(context=0x06e00100, tsi=0, name=<unavailable>, qtype=LWS_ADNS_RECORD_A, cb=(test.exe`lws_client_connect_3_connect at connect3.c:141), wsi=0x07541880, opaque=0x00000000) at async-dns.c:712:7
    frame #3: 0x008af01b test.exe`lws_client_connect_2_dnsreq(wsi=<unavailable>) at connect2.c:366:7
    frame #4: 0x0071909e test.exe`lws_http_client_connect_via_info2(wsi=<unavailable>) at connect.c:71:9
    frame #5: 0x00719856 test.exe`lws_client_connect_via_info(i=<unavailable>) at connect.c:511:9

The logs just show:

[2021/10/16 13:16:09:5595] N:  ++ [wsicli|47|RAW/raw-skt/default/c.msn.com]

I gusss libwebsockets can not parse the DNS results.

lws-team commented 2 years ago

Thanks for the reproducer on this and the bing one, I am looking at it.

calvin2021y commented 2 years ago

The DNS answer record just include a CNAME, without IPV4 address.

lws-team commented 2 years ago

It should know how to handle CNAMEs but it looks like that's broke somewhere atm.

lws-team commented 2 years ago

Does this help?

diff --git a/lib/system/async-dns/async-dns-parse.c b/lib/system/async-dns/async-dns-parse.c
index 17e95aa20f..bdfe205037 100644
--- a/lib/system/async-dns/async-dns-parse.c
+++ b/lib/system/async-dns/async-dns-parse.c
@@ -81,7 +81,8 @@ again1:

                return -1;
        }
-       if (ll > budget) {
+
+       if (ls + ll > ols + budget) {
                lwsl_notice("%s: label too long %d vs %d\n", __func__, ll, budget);

                return -1;
calvin2021y commented 2 years ago

The this patch get this:

Process 1888 stopped
* thread #1, stop reason = Exception 0xc0000005 encountered at address 0x1279878: Access violation reading location 0x2e63bece
    frame #0: 0x01279878 text.exe`lws_sort_dns(wsi=<unavailable>, result=<unavailable>) at sort-dns.c:625:27
Process 1888 launched: 'C:\Users\dev\text.exe' (i686)
(lldb) bt
* thread #1, stop reason = Exception 0xc0000005 encountered at address 0x1279878: Access violation reading location 0x2e63bece
  * frame #0: 0x01279878 text.exe`lws_sort_dns(wsi=<unavailable>, result=<unavailable>) at sort-dns.c:625:27
    frame #1: 0x0120d80b text.exe`lws_client_connect_3_connect(wsi=<unavailable>, ads=<unavailable>, result=0x07a33080, n=<unavailable>, opaque=<unavailable>) at connect3.c:175:3
    frame #2: 0x0115f763 text.exe`lws_async_dns_complete(q=<unavailable>, c=<unavailable>) at async-dns.c:103:7
    frame #3: 0x01208461 text.exe`lws_adns_parse_udp(dns=<unavailable>, pkt=<unavailable>, len=<unavailable>) at async-dns-parse.c:692:2
    frame #4: 0x0115f934 text.exe`callback_async_dns(wsi=<unavailable>, reason=<unavailable>, user=<unavailable>, in=<unavailable>, len=<unavailable>) at async-dns.c:306:3
    frame #5: 0x00fd6fd0 text.exe`user_callback_handle_rxflow(callback_function=(text.exe`callback_async_dns at async-dns.c:288), wsi=0x07a03880, reason=LWS_CALLBACK_RAW_RX, user=0x00000000, in=0x072006a8, len=148) at wsi.c:498:6
    frame #6: 0x0116dca4 text.exe`rops_handle_POLLIN_raw_skt(pt=<unavailable>, wsi=<unavailable>, pollfd=<unavailable>) at ops-raw-skt.c:149:8
    frame #7: 0x0116fe25 text.exe`lws_service_fd_tsi(context=<unavailable>, pollfd=<unavailable>, tsi=<unavailable>) at service.c:762:10
    frame #8: 0x00fda2c9 text.exe`lws_io_cb(watcher=<unavailable>, status=<unavailable>, revents=<unavailable>) at libuv.c:147:2
    frame #9: 0x00fc254e text.exe`uv_process_poll_req at poll.c:188:7
    frame #10: 0x00fc24b8 text.exe`uv_process_poll_req(loop=<unavailable>, handle=<unavailable>, req=<unavailable>) at poll.c:529:5
    frame #11: 0x00fb5cc4 text.exe`uv_run [inlined] uv_process_reqs at req-inl.h:202:9
    frame #12: 0x00fb5b92 text.exe`uv_run(loop=<unavailable>, mode=<unavailable>) at core.c:609:19
lws-team commented 2 years ago

Yes that is something else. The patch fixes bing.com, c.msn.com problem I can reproduce with valgrind, it seems to be a race with DNS results still coming after the requesting wsi has gone. Will poke around with it.

lws-team commented 2 years ago

c.msn.com problem I can reproduce with valgrind

Nope the valgrind problem there is caused in a contributed patch from earlier today to do with logging changes, I fixed that there and will push it later.

c.msn.com just fails cleanly without valgrind traces.

Looking with dig, that really is an NXDOMAIN.

lws-team commented 2 years ago

I also built current main (which now has the log deref fix) on Windows 10, it acts the same as on Linux for c.msn.com, completes after not being able to get a DNS result (since it is actually an NXDOMAIN).

lws-team commented 2 years ago

I am testing it with lws-minimal-http-client --server c.msn.com it might be worth checking if any difference between that and your actual app.

calvin2021y commented 2 years ago

I am write a raw proxy here.

When I test from CURL , it work as expect.

When I test from Edge browser same domain, then I get this error. So I guess maybe your tests not cover this case.

calvin2021y commented 2 years ago

logs before I get frame #0: 0x01231948 test.exelws_sort_dns(wsi=, result=) at sort-dns.c:625:27`

proxy=ntp.msn.cn:443
[2021/10/16 09:57:14:3138] N: __lws_lc_tag:  ++ [wsicli|4|RAW/raw-skt/default/ntp.msn.cn] (4)
[2021/10/16 09:57:14:3218] N: __lws_lc_untag:  -- [wsicli|2|RAW/raw-skt/default/13.107.21.200] (3) 18.133s
[2021/10/16 09:57:14:4478] N: lws_adns_iterate: recursing looking for global.asimov.events.data.trafficmanager.net.
[2021/10/16 09:57:14:4528] N: lws_adns_iterate: recursing looking for onedscolprdcus08.centralus.cloudapp.azure.com.
[2021/10/16 09:57:14:4598] N: lws_adns_iterate: recursing looking for msn-cn.a-0032.a-msedge.net.
[2021/10/16 09:57:14:4668] N: lws_adns_parse_label: label too long 6 vs 2
[2021/10/16 09:57:14:4779] N: lws_adns_iterate: recursing looking for onedscolprdwus14.westus.cloudapp.azure.com.
lws-team commented 2 years ago

[2021/10/16 09:57:14:4668] N: lws_adns_parse_label: label too long 6 vs 2

This is coming after the patch from earlier?

calvin2021y commented 2 years ago

Sorry, I forget to patch when I try rebuild into msvcrt from ucrt.

WIth the patch still:

[2021/10/16 10:20:35:3161] N: __lws_lc_tag:  ++ [wsicli|18|RAW/raw-skt/default/assets.msn.cn] (22)
[2021/10/16 10:20:35:3602] N: lws_adns_iterate: recursing looking for a1834.dspg2.akamai.net.
[2021/10/16 10:20:35:3662] N: lws_adns_iterate: recursing looking for c-msn-com-nsatc.trafficmanager.net.
[2021/10/16 10:20:35:3721] N: lws_adns_iterate: recursing looking for c-msn-com-europe-vip.trafficmanager.net.
[2021/10/16 10:20:35:3791] N: lws_adns_iterate: recursing looking for c-bing-com.a-0001.a-msedge.net.
[2021/10/16 10:20:35:3841] N: lws_adns_iterate: recursing looking for dual-a-0001.a-msedge.net.
[2021/10/16 10:20:35:3901] N: lws_adns_iterate: recursing looking for assets.msn.com.edgekey.net.
[2021/10/16 10:20:35:3961] N: lws_adns_iterate: recursing looking for e28578.d.akamaiedge.net.
[2021/10/16 10:20:35:4021] N: lws_adns_iterate: recursing looking for api-msn-com.a-0003.a-msedge.net.
[2021/10/16 10:20:35:4081] N: lws_adns_iterate: recursing looking for a-0003.a-msedge.net.
[2021/10/16 10:20:35:4141] W: lws_plat_set_socket_options_ip: not implemented on windows platform
[2021/10/16 10:20:35:4621] N: lws_adns_iterate: recursing looking for assets.msn.cn-c.edgekey.net.
[2021/10/16 10:20:35:4681] N: lws_adns_iterate: recursing looking for assets.msn.cn-c.edgekey.net.globalredir.akadns.net.
[2021/10/16 10:20:35:4761] N: lws_adns_iterate: recursing looking for e19240.d.akamaiedge.net.
[2021/10/16 10:20:35:5051] W: lws_plat_set_socket_options_ip: not implemented on windows platform
Process 4840 stopped
* thread #1, stop reason = Exception 0xc0000005 encountered at address 0x11b1948: Access violation reading location 0x2e63f01d
    frame #0: 0x011b1948 test.exe`lws_sort_dns(wsi=<unavailable>, result=<unavailable>) at sort-dns.c:625:27
Process 4840 launched: 'C:\Users\dev\test.exe' (i686)
(lldb) bt
* thread #1, stop reason = Exception 0xc0000005 encountered at address 0x11b1948: Access violation reading location 0x2e63f01d
  * frame #0: 0x011b1948 test.exe`lws_sort_dns(wsi=<unavailable>, result=<unavailable>) at sort-dns.c:625:27
    frame #1: 0x0114239b test.exe`lws_client_connect_3_connect(wsi=<unavailable>, ads=<unavailable>, result=0x06652908, n=<unavailable>, opaque=<unavailable>) at connect3.c:175:3
    frame #2: 0x01090843 test.exe`lws_async_dns_complete(q=<unavailable>, c=<unavailable>) at async-dns.c:103:7
    frame #3: 0x0113d011 test.exe`lws_adns_parse_udp(dns=<unavailable>, pkt=<unavailable>, len=<unavailable>) at async-dns-parse.c:692:2
    frame #4: 0x01090a14 test.exe`callback_async_dns(wsi=<unavailable>, reason=<unavailable>, user=<unavailable>, in=<unavailable>, len=<unavailable>) at async-dns.c:306:3
    frame #5: 0x00f07400 test.exe`user_callback_handle_rxflow(callback_function=(test.exe`callback_async_dns at async-dns.c:288), wsi=0x05bdb1e0, reason=LWS_CALLBACK_RAW_RX, user=0x00000000, in=0x05b46120, len=118) at wsi.c:498:6
    frame #6: 0x0109ed44 test.exe`rops_handle_POLLIN_raw_skt(pt=<unavailable>, wsi=<unavailable>, pollfd=<unavailable>) at ops-raw-skt.c:149:8
    frame #7: 0x010a0eb5 test.exe`lws_service_fd_tsi(context=<unavailable>, pollfd=<unavailable>, tsi=<unavailable>) at service.c:762:10
    frame #8: 0x00f0a6f9 test.exe`lws_io_cb(watcher=<unavailable>, status=<unavailable>, revents=<unavailable>) at libuv.c:147:2
lws-team commented 2 years ago

On windows, using the minimal client with --server assets.msn.cn, it can resolve it okay

[2021/10/16 11:29:52:9207] N: lws_cache_nscookiejar_create: create NSC
[2021/10/16 11:29:52:9363] N: __lws_lc_tag:  ++ [wsicli|0|GET/h1/default/assets.msn.cn] (1)
[2021/10/16 11:29:52:9363] N: lws_adns_iterate: recursing looking for assets.msn.cn-c.edgekey.net.
[2021/10/16 11:29:52:9363] N: lws_adns_iterate: recursing looking for assets.msn.cn-c.edgekey.net.globalredir.akadns.net.
[2021/10/16 11:29:52:9520] N: lws_adns_iterate: recursing looking for e19240.d.akamaiedge.net.
[2021/10/16 11:29:52:9520] W: lws_plat_set_socket_options_ip: not implemented on windows platform
[2021/10/16 11:29:53:0301] N: __lws_lc_tag:  ++ [mux|0|h2_sid1_(wsicli|0|GET/h1/default/assets.msn.cn)] (1)
[2021/10/16 11:29:53:0769] U: Connected to 23.204.236.126, http response: 404
[2021/10/16 11:29:53:0769] U: RECEIVE_CLIENT_HTTP_READ: read 10
...

I wonder if it is that your local dns server issues something different and the problem is starting there. Can you do this to dump the packet

diff --git a/lib/system/async-dns/async-dns-parse.c b/lib/system/async-dns/async-dns-parse.c
index bdfe205037..9f6897ab41 100644
--- a/lib/system/async-dns/async-dns-parse.c
+++ b/lib/system/async-dns/async-dns-parse.c
@@ -538,7 +538,7 @@ lws_adns_parse_udp(lws_async_dns_t *dns, const uint8_t *pkt, size_t len)
        int n, ncname;
        size_t est;

-       // lwsl_hexdump_notice(pkt, len);
+       lwsl_hexdump_notice(pkt, len);

        /* we have to at least have the header */

and paste the last one of those before the error. For me it looks like

[2021/10/16 11:35:25:3885] N: 0000: C2 24 81 80 00 01 00 01 00 00 00 00 06 65 31 39    .$...........e19
[2021/10/16 11:35:25:3886] N: 0010: 32 34 30 01 64 0A 61 6B 61 6D 61 69 65 64 67 65    240.d.akamaiedge
[2021/10/16 11:35:25:3886] N: 0020: 03 6E 65 74 00 00 01 00 01 C0 0C 00 01 00 01 00    .net............
[2021/10/16 11:35:25:3887] N: 0030: 00 00 1D 00 04 17 CC EC 7E                         ........~    
calvin2021y commented 2 years ago

always exit with this:

[2021/10/16 10:42:02:8215] N:
[2021/10/16 10:42:02:8227] N: 0000: D4 5D 81 80 00 01 00 00 00 01 00 00 06 61 2D 30    .]...........a-0
[2021/10/16 10:42:02:8310] N: 0010: 30 33 32 08 61 2D 6D 73 65 64 67 65 03 6E 65 74    032.a-msedge.net
[2021/10/16 10:42:02:8371] N: 0020: 00 00 1C 00 01 C0 13 00 06 00 01 00 00 00 B1 00    ................
[2021/10/16 10:42:02:8426] N: 0030: 30 03 6E 73 31 C0 13 06 6D 73 6E 68 73 74 09 6D    0.ns1...msnhst.m
[2021/10/16 10:42:02:8488] N: 0040: 69 63 72 6F 73 6F 66 74 03 63 6F 6D 00 78 2B 22    icrosoft.com.x+"
[2021/10/16 10:42:02:8518] N: 0050: E5 00 00 07 08 00 00 03 84 00 24 EA 00 00 00 00    ..........$.....
[2021/10/16 10:42:02:8613] N: 0060: F0                                                 .
[2021/10/16 10:42:02:8664] N:
[2021/10/16 10:42:02:8664] N:
[2021/10/16 10:42:02:8719] N: 0000: FC CA 81 80 00 01 00 01 00 00 00 00 06 61 2D 30    .............a-0
[2021/10/16 10:42:02:8790] N: 0010: 30 30 33 08 61 2D 6D 73 65 64 67 65 03 6E 65 74    003.a-msedge.net
[2021/10/16 10:42:02:8850] N: 0020: 00 00 01 00 01 C0 0C 00 01 00 01 00 00 00 4F 00    ..............O.
[2021/10/16 10:42:02:8882] N: 0030: 04 CC 4F C5 CB                                     ..O..
[2021/10/16 10:42:02:8977] N:
[2021/10/16 10:42:02:8997] N:
[2021/10/16 10:42:02:9027] N: 0000: BB 43 81 80 00 01 00 04 00 00 00 00 03 77 77 77    .C...........www
[2021/10/16 10:42:02:9028] N: 0010: 04 62 69 6E 67 03 63 6F 6D 00 00 1C 00 01 C0 0C    .bing.com.......
[2021/10/16 10:42:02:9143] N: 0020: 00 05 00 01 00 00 54 46 00 2A 06 61 2D 30 30 30    ......TF.*.a-000
[2021/10/16 10:42:02:9174] N: 0030: 31 0A 61 2D 61 66 64 65 6E 74 72 79 03 6E 65 74    1.a-afdentry.net
[2021/10/16 10:42:02:9174] N: 0040: 0E 74 72 61 66 66 69 63 6D 61 6E 61 67 65 72 03    .trafficmanager.
[2021/10/16 10:42:02:9323] N: 0050: 6E 65 74 00 C0 2A 00 05 00 01 00 00 00 22 00 24    net..*.......".$
[2021/10/16 10:42:02:9323] N: 0060: 0C 77 77 77 2D 62 69 6E 67 2D 63 6F 6D 0B 64 75    .www-bing-com.du
[2021/10/16 10:42:02:9447] N: 0070: 61 6C 2D 61 2D 30 30 30 31 08 61 2D 6D 73 65 64    al-a-0001.a-msed
[2021/10/16 10:42:02:9467] N: 0080: 67 65 C0 4F C0 60 00 05 00 01 00 00 00 22 00 02    ge.O.`......."..
[2021/10/16 10:42:02:9467] N: 0090: C0 6D C0 6D 00 1C 00 01 00 00 00 22 00 10 26 20    .m.m......."..&
[2021/10/16 10:42:02:9615] N: 00A0: 01 EC 0C 11 00 00 00 00 00 00 00 00 02 00          ..............
[2021/10/16 10:42:02:9615] N:
[2021/10/16 10:42:02:9615] W: lws_plat_set_socket_options_ip: not implemented on windows platform
[2021/10/16 10:42:02:9764] N:
[2021/10/16 10:42:02:9816] N: 0000: 12 B3 81 80 00 01 00 02 00 01 00 00 01 63 03 6D    .............c.m
[2021/10/16 10:42:02:9818] N: 0010: 73 6E 03 63 6F 6D 00 00 1C 00 01 C0 0C 00 05 00    sn.com..........
[2021/10/16 10:42:02:9946] N: 0020: 01 00 00 54 5E 00 24 0F 63 2D 6D 73 6E 2D 63 6F    ...T^.$.c-msn-co
[2021/10/16 10:42:02:9968] N: 0030: 6D 2D 6E 73 61 74 63 0E 74 72 61 66 66 69 63 6D    m-nsatc.trafficm
[2021/10/16 10:42:02:9968] N: 0040: 61 6E 61 67 65 72 03 6E 65 74 00 C0 27 00 05 00    anager.net..'...
[2021/10/16 10:42:03:0114] N: 0050: 01 00 00 00 3A 00 17 14 63 2D 6D 73 6E 2D 63 6F    ....:...c-msn-co
[2021/10/16 10:42:03:0114] N: 0060: 6D 2D 65 75 72 6F 70 65 2D 76 69 70 C0 37 C0 37    m-europe-vip.7.7
[2021/10/16 10:42:03:0263] N: 0070: 00 06 00 01 00 00 00 1C 00 2E 03 74 6D 31 06 64    ...........tm1.d
[2021/10/16 10:42:03:0263] N: 0080: 6E 73 2D 74 6D C0 12 0A 68 6F 73 74 6D 61 73 74    ns-tm...hostmast
[2021/10/16 10:42:03:0263] N: 0090: 65 72 C0 37 77 64 96 60 00 00 03 84 00 00 01 2C    er.7wd.`.......,
[2021/10/16 10:42:03:0450] N: 00A0: 00 24 EA 00 00 00 00 1E                            .$......
[2021/10/16 10:42:03:0519] N:
lws-team commented 2 years ago

Hm... c.msn.com is blackholed at my router dns, that's why it is an nxdomain for me.

I assume that you crash after the last hexdump.

If I temporarily use 8.8.8.8, then it resolves and works fine... it redirects to bing

[2021/10/16 11:57:26:1882] N: __lws_lc_tag:  ++ [546878|wsicli|0|GET/h1/default/c.msn.com] (1)
[2021/10/16 11:57:26:2474] N: 
[2021/10/16 11:57:26:2504] N: 0000: 2A B2 81 80 00 01 00 02 00 00 00 00 01 63 03 6D    *............c.m
[2021/10/16 11:57:26:2507] N: 0010: 73 6E 03 63 6F 6D 00 00 01 00 01 C0 0C 00 05 00    sn.com..........
[2021/10/16 11:57:26:2509] N: 0020: 01 00 00 00 D8 00 24 0F 63 2D 6D 73 6E 2D 63 6F    ......$.c-msn-co
[2021/10/16 11:57:26:2510] N: 0030: 6D 2D 6E 73 61 74 63 0E 74 72 61 66 66 69 63 6D    m-nsatc.trafficm
[2021/10/16 11:57:26:2512] N: 0040: 61 6E 61 67 65 72 03 6E 65 74 00 C0 27 00 01 00    anager.net..'...
[2021/10/16 11:57:26:2521] N: 0050: 01 00 00 00 3C 00 04 34 8E 72 02                   ....<..4.r.     
[2021/10/16 11:57:26:2523] N: 
[2021/10/16 11:57:26:2588] N: lws_adns_iterate: recursing looking for c-msn-com-nsatc.trafficmanager.net.
[2021/10/16 11:57:26:2949] N: 
[2021/10/16 11:57:26:2952] N: 0000: 87 2C 81 80 00 01 00 01 00 00 00 00 0F 63 2D 6D    .,...........c-m
[2021/10/16 11:57:26:2954] N: 0010: 73 6E 2D 63 6F 6D 2D 6E 73 61 74 63 0E 74 72 61    sn-com-nsatc.tra
[2021/10/16 11:57:26:2957] N: 0020: 66 66 69 63 6D 61 6E 61 67 65 72 03 6E 65 74 00    fficmanager.net.
[2021/10/16 11:57:26:2959] N: 0030: 00 01 00 01 C0 0C 00 01 00 01 00 00 00 3C 00 04    .............<..
[2021/10/16 11:57:26:2962] N: 0040: 34 8E 72 02                                        4.r.            
[2021/10/16 11:57:26:2963] N: 
[2021/10/16 11:57:26:8136] N: __lws_lc_tag:  ++ [546878|mux|0|h2_sid1_(546878|wsicli|0)] (1)
[2021/10/16 11:57:26:8862] N: lws_client_reset: REDIRECT c.bing.com:443, path='c.gif?CtsSyncId=039D4BE59F734F8895D5D27E6BF7D8A3&RedC=c.msn.com&MXFR=0B036774992D661C04F577A69D2D6402', ssl = 1, alpn='h2,http/1.1'
[2021/10/16 11:57:26:9034] N: ignoring straggling data fl 0x1
[2021/10/16 11:57:26:9046] N: rops_handle_POLLIN_h2: removed [546878|wsicli|0|GET/h1/default/c.msn.com] from dll_buflist
[2021/10/16 11:57:26:9215] N: 
[2021/10/16 11:57:26:9216] N: 0000: 99 6E 81 80 00 01 00 04 00 00 00 00 01 63 04 62    .n...........c.b
[2021/10/16 11:57:26:9217] N: 0010: 69 6E 67 03 63 6F 6D 00 00 01 00 01 C0 0C 00 05    ing.com.........
[2021/10/16 11:57:26:9219] N: 0020: 00 01 00 00 30 47 00 20 0A 63 2D 62 69 6E 67 2D    ....0G. .c-bing-
[2021/10/16 11:57:26:9220] N: 0030: 63 6F 6D 06 61 2D 30 30 30 31 08 61 2D 6D 73 65    com.a-0001.a-mse
[2021/10/16 11:57:26:9221] N: 0040: 64 67 65 03 6E 65 74 00 C0 28 00 05 00 01 00 00    dge.net..(......
[2021/10/16 11:57:26:9223] N: 0050: 00 27 00 0E 0B 64 75 61 6C 2D 61 2D 30 30 30 31    .'...dual-a-0001
[2021/10/16 11:57:26:9224] N: 0060: C0 3A C0 54 00 01 00 01 00 00 00 27 00 04 CC 4F    .:.T.......'...O
[2021/10/16 11:57:26:9225] N: 0070: C5 C8 C0 54 00 01 00 01 00 00 00 27 00 04 0D 6B    ...T.......'...k
[2021/10/16 11:57:26:9226] N: 0080: 15 C8                                              ..              
[2021/10/16 11:57:26:9226] N: 
[2021/10/16 11:57:26:9226] N: lws_adns_iterate: recursing looking for c-bing-com.a-0001.a-msedge.net.
[2021/10/16 11:57:26:9228] N: lws_adns_iterate: recursing looking for dual-a-0001.a-msedge.net.
[2021/10/16 11:57:26:9391] N: 
[2021/10/16 11:57:26:9393] N: 0000: E8 C6 81 80 00 01 00 02 00 00 00 00 0B 64 75 61    .............dua
[2021/10/16 11:57:26:9394] N: 0010: 6C 2D 61 2D 30 30 30 31 08 61 2D 6D 73 65 64 67    l-a-0001.a-msedg
[2021/10/16 11:57:26:9395] N: 0020: 65 03 6E 65 74 00 00 01 00 01 C0 0C 00 01 00 01    e.net...........
[2021/10/16 11:57:26:9397] N: 0030: 00 00 00 2C 00 04 CC 4F C5 C8 C0 0C 00 01 00 01    ...,...O........
[2021/10/16 11:57:26:9398] N: 0040: 00 00 00 2C 00 04 0D 6B 15 C8                      ...,...k..      
[2021/10/16 11:57:26:9398] N: 
[2021/10/16 11:57:27:1100] N: __lws_lc_tag:  ++ [546878|mux|1|h2_sid1_(546878|mux|0)] (2)
[2021/10/16 11:57:27:1437] N: lws_client_reset: REDIRECT c.msn.com:443, path='c.gif?CtsSyncId=039D4BE59F734F8895D5D27E6BF7D8A3&MUID=2AEF14227A5263AA031704F07B31623A', ssl = 1, alpn='h2,http/1.1'
[2021/10/16 11:57:27:1452] N: ignoring straggling data fl 0x1
[2021/10/16 11:57:27:1453] N: rops_handle_POLLIN_h2: removed [546878|mux|0|h2_sid1_(546878|wsicli|0)] from dll_buflist
[2021/10/16 11:57:27:2496] N: __lws_lc_tag:  ++ [546878|mux|2|h2_sid1_(546878|mux|1)] (3)
[2021/10/16 11:57:27:2886] U: Connected to 52.142.114.2, http response: 200
[2021/10/16 11:57:27:2937] U: RECEIVE_CLIENT_HTTP_READ: read 42
[2021/10/16 11:57:27:2961] U: LWS_CALLBACK_COMPLETED_CLIENT_HTTP

Your response is different though, yours seems to go to a CNAME of c-msn-com-europe-vip.trafficmanager,net and mine goes to a cname of dual-a-0001.a-msedge.net c-msn-com-nsatc.trafficmanager.net...

calvin2021y commented 2 years ago

If I dont use lws, then I can connect to bing.com from EDGE browser very fast, If I use proxy then it crashed. Test with google and other site is fine.

So I guess some DNS record lws can not parse, some time there is multiable DNS response and some Poisoning VPN response come from internet. I hope lws dns parse could be more robust

There also some invalid DNS response

lws-team commented 2 years ago

Yes, that's why it is not on by default yet.

But to get more robust, I have to be able to debug these corner cases. I am a bit stuck because with the patch from earlier, I can't reproduce any problem here on Linux or WIndows.

calvin2021y commented 2 years ago

Test on windows with CURL also work. I guess this DNS packet trigger the bugs and only EDGE send it.

[2021/10/16 10:42:02:9816] N: 0000: 12 B3 81 80 00 01 00 02 00 01 00 00 01 63 03 6D    .............c.m
[2021/10/16 10:42:02:9818] N: 0010: 73 6E 03 63 6F 6D 00 00 1C 00 01 C0 0C 00 05 00    sn.com..........
[2021/10/16 10:42:02:9946] N: 0020: 01 00 00 54 5E 00 24 0F 63 2D 6D 73 6E 2D 63 6F    ...T^.$.c-msn-co
[2021/10/16 10:42:02:9968] N: 0030: 6D 2D 6E 73 61 74 63 0E 74 72 61 66 66 69 63 6D    m-nsatc.trafficm
[2021/10/16 10:42:02:9968] N: 0040: 61 6E 61 67 65 72 03 6E 65 74 00 C0 27 00 05 00    anager.net..'...
[2021/10/16 10:42:03:0114] N: 0050: 01 00 00 00 3A 00 17 14 63 2D 6D 73 6E 2D 63 6F    ....:...c-msn-co
[2021/10/16 10:42:03:0114] N: 0060: 6D 2D 65 75 72 6F 70 65 2D 76 69 70 C0 37 C0 37    m-europe-vip.7.7
[2021/10/16 10:42:03:0263] N: 0070: 00 06 00 01 00 00 00 1C 00 2E 03 74 6D 31 06 64    ...........tm1.d
[2021/10/16 10:42:03:0263] N: 0080: 6E 73 2D 74 6D C0 12 0A 68 6F 73 74 6D 61 73 74    ns-tm...hostmast
[2021/10/16 10:42:03:0263] N: 0090: 65 72 C0 37 77 64 96 60 00 00 03 84 00 00 01 2C    er.7wd.`.......,
[2021/10/16 10:42:03:0450] N: 00A0: 00 24 EA 00 00 00 00 1E                            .$......
[2021/10/16 10:42:03:0519] N:
lws-team commented 2 years ago

I'll try hack something up to use that, but to be clear

only EDGE send it.

EDGE isn't sending it, right? The DNS server is replying with it in response to a request from EDGE?

calvin2021y commented 2 years ago

I think so.

calvin2021y commented 2 years ago

This packet is valid DNS, but trigger lws crash.

 [ 219,29,129,128,0,1,0,2,0,1,0,0,6,97,115,115,101,116,115,3,109,115,110,3,99,111,109,0,0,28,0,1,192,12,0,5,0,1,0,0,81,199,0,28,6,97,115,115,101,116,115,3,109,115,110,3,99,111,109,7,101,100,103,101,107,101,121,3,110,101,116,0,192,44,0,5,0,1,0,0,0,235,0,22,6,101,50,56,53,55,56,1,100,10,97,107,97,109,97,105,101,100,103,101,192,67,192,91,0,6,0,1,0,0,1,79,0,46,3,110,48,100,192,93,10,104,111,115,116,109,97,115,116,101,114,6,97,107,97,109,97,105,192,23,97,106,246,231,0,0,3,232,0,0,3,232,0,0,3,232,0,0,7,8,]
lws-team commented 2 years ago

I will continue with integrating that and the other "bad response" into the api test tomorrow... can you let me know what is the query address (eg, bing.com or whatever) that produced the response above so I can make the flow look like a real query to the code?

lws-team commented 2 years ago

Never mind, I see it is "assets.msn.com".

I have added result injection support into the adns api test, and added both the cases mentioned here, they are parsed and resolved correctly. There's no problem with or without valgrind on linux, and I tested it on windows with no problem either.

>.\bin\Debug\lws-api-test-async-dns.exe
[2021/10/17 06:50:24:6080] U: LWS API selftest: Async DNS
[2021/10/17 06:50:24:6236] N: lws_create_context: LWS: 4.3.99-v4.3.0-37-gd4bb5809c, NET CLI SRV H1 H2 WS SS-JSON-POL ConMon FLTINJ ASYNC_DNS IPv6-absent
[2021/10/17 06:50:24:6393] N: __lws_lc_tag:  ++ [wsi|0|pipe] (1)
[2021/10/17 06:50:24:6549] N: __lws_lc_tag:  ++ [vh|0|system||-1] (1)
[2021/10/17 06:50:24:6861] N: __lws_lc_tag:  ++ [wsisrv|0|adopted] (1)
[2021/10/17 06:50:24:6861] N: __lws_lc_tag:  ++ [vh|1|default||-1] (2)
[2021/10/17 06:50:24:7174] N: lws_plat_vhost_tls_client_ctx_init: Imported 24 certs from plat store
[2021/10/17 06:50:24:7956] N: next_test_cb: querying warmcat.com
[2021/10/17 06:50:24:7956] W: cb1: 0: warmcat.com 4 46.105.127.147
[2021/10/17 06:50:24:8112] N: next_test_cb: querying libwebsockets.org
[2021/10/17 06:50:24:8112] W: cb1: 0: libwebsockets.org 4 46.105.127.147
[2021/10/17 06:50:24:8267] N: next_test_cb: querying doesntexist
[2021/10/17 06:50:25:6556] W: cb1: no results
[2021/10/17 06:50:25:6716] N: next_test_cb: querying localhost
[2021/10/17 06:50:25:6716] W: cb1: 0: 127.0.0.1 4 127.0.0.1
[2021/10/17 06:50:25:6862] N: next_test_cb: querying ipv4only.warmcat.com
[2021/10/17 06:50:25:6862] W: cb1: 0: ipv4only.warmcat.com 4 46.105.127.147
[2021/10/17 06:50:25:7026] N: next_test_cb: querying onevalid.bogus.warmcat.com
[2021/10/17 06:50:25:7026] W: cb1: 0: onevalid.bogus.warmcat.com 4 46.105.127.147
[2021/10/17 06:50:25:7174] W: cb1: 1: onevalid.bogus.warmcat.com 4 127.0.0.2
[2021/10/17 06:50:25:7331] W: cb1: 2: onevalid.bogus.warmcat.com 4 127.0.0.1
[2021/10/17 06:50:25:7646] N: next_test_cb: querying c.msn.com
[2021/10/17 06:50:25:7646] N: next_test_cb: injecting result
[2021/10/17 06:50:25:7646] N: lws_adns_iterate: recursing looking for c-msn-com-nsatc.trafficmanager.net.
[2021/10/17 06:50:25:7826] N: lws_adns_iterate: recursing looking for c-msn-com-europe-vip.trafficmanager.net.
[2021/10/17 06:50:25:7956] W: cb1: 0: c-msn-com-europe-vip.trafficmanager.net 4 52.142.114.2
[2021/10/17 06:50:25:8124] N: next_test_cb: querying assets.msn.com
[2021/10/17 06:50:25:8124] N: next_test_cb: injecting result
[2021/10/17 06:50:25:8268] N: lws_adns_iterate: recursing looking for assets.msn.com.edgekey.net.
[2021/10/17 06:50:25:8426] N: lws_adns_iterate: recursing looking for e28578.d.akamaiedge.net.
[2021/10/17 06:50:25:8426] W: cb1: 0: e28578.d.akamaiedge.net 4 2.22.228.41
[2021/10/17 06:50:25:8581] W: cb1: 1: e28578.d.akamaiedge.net 4 2.22.228.65
[2021/10/17 06:50:25:8581] N: __lws_lc_untag:  -- [wsi|0|pipe] (0) 1.218s
[2021/10/17 06:50:25:8767] N: __lws_lc_untag:  -- [vh|0|system||-1] (1) 1.221s
[2021/10/17 06:50:25:9206] N: __lws_lc_untag:  -- [wsisrv|0|adopted|raw-skt] (0) 1.234s
[2021/10/17 06:50:25:9518] N: __lws_lc_untag:  -- [vh|1|default||-1] (0) 1.265s
[2021/10/17 06:50:25:9674] U: Completed: ALL PASS: 26 / 26

I think the next step is please update to main (which has the patch from earlier, and the api test adns injection patch), build with -DLWS_WITH_MINIMAL_EXAMPLES=1 and also try to run .\bin\Debug\lws-api-test-async-dns.exe

calvin2021y commented 2 years ago

Sorry for false report about the DNS packet cause the problem. I am cross build and some how I am not able to build with DLWS_WITH_MINIMAL_EXAMPLES=1. If it is not a DNS packet parse issue then I guess I will not able to catch the error from this test case.

The problem only cloud be network related since I find when the proxy is working but I close the browser then it throw this DNS error.

With Curl I can not recreate it becasue CURL connection always close graceful.

Without ASYNC DNS my app work as expect(there is about 10 users use it every day for 1 month), but it block event loop if DNS blocked.

lws-team commented 2 years ago

since I find when the proxy is working but I close the browser then it throw this DNS error.

Yes if the wsi the lookup is for closes, but the Async DNS lookup is still ongoing, lws has to handle that correctly or it will be touching a dead wsi later when it gets results or times out. I will also look at adding this kind of thing to the api tests and see if that throws some light.

calvin2021y commented 2 years ago

Please also add this into your DNS test case( one of them):

e28578.d.akamaiedge.net

 [ 20,191,129,128,0,1,0,0,0,1,0,0,6,101,50,56,53,55,56,1,100,10,97,107,97,109,97,105,101,100,103,101,3,110,101,116,0,0,28,0,1,192,19,0,6,0,1,0,0,1,17,0,49,3,110,48,100,192,21,10,104,111,115,116,109,97,115,116,101,114,6,97,107,97,109,97,105,3,99,111,109,0,97,107,217,31,0,0,3,232,0,0,3,232,0,0,3,232,0,0,7,8,]

a-0003.a-msedge.net

 [ 126,215,129,128,0,1,0,0,0,1,0,0,6,97,45,48,48,48,51,8,97,45,109,115,101,100,103,101,3,110,101,116,0,0,28,0,1,192,19,0,6,0,1,0,0,0,172,0,48,3,110,115,49,192,19,6,109,115,110,104,115,116,9,109,105,99,114,111,115,111,102,116,3,99,111,109,0,120,43,34,229,0,0,7,8,0,0,3,132,0,36,234,0,0,0,0,240,]

c-msn-com-europe-vip.trafficmanager.net

 [ 73,87,129,128,0,1,0,0,0,1,0,0,20,99,45,109,115,110,45,99,111,109,45,101,117,114,111,112,101,45,118,105,112,14,116,114,97,102,102,105,99,109,97,110,97,103,101,114,3,110,101,116,0,0,28,0,1,192,33,0,6,0,1,0,0,0,30,0,49,3,116,109,49,6,100,110,115,45,116,109,3,99,111,109,0,10,104,111,115,116,109,97,115,116,101,114,192,33,7,11,234,133,0,0,3,132,0,0,1,44,0,36,234,0,0,0,0,30,]

There is only a question and authority record, I guess with this kind response WSI close before ASYNC DNS callback is fired. (there is no address result)

lws-team commented 2 years ago

I added those three here and they cleanly produce no results, which I think is expected.

How is your proxy actually handling the onward connection when the inbound connection goes away? What does it do in the inbound CLOSE handler about the onward connection?

Edit: Do I understand it right that it will be the onward wsi that is doing the DNS lookup, having gotten the address from the inbound connection?

lws-team commented 2 years ago

I also added a --cos close-on-start option to minimal-http-client, that closes the wsi after starting the client connection before the DNS result appears. That is also cleanly handled for valgrind.

So I am wondering if this behaviour requires your application code.

calvin2021y commented 2 years ago

Sorry for late replay.

1) I can not catch this error with CURL with domain https://a-0003.a-msedge.net 2) I have no problem without ASYNC dns 3) I use lws as onward, not as inbound connection. 4) If inbound lost without pendings message to send, I call lws_wsi_close. if there is message PENDING_TIMEOUT_KILLED_BY_PROXY_CLIENT_CLOSE after 3 secs. 5) there is no call to wsi after it closed.

calvin2021y commented 2 years ago

This error come from lws_client_connect_via_info, in this step the WSI is not connected yet?

Process 2896 stopped
* thread #1, stop reason = Exception 0xc0000005 encountered at address 0xaa47c8: Access violation reading location 0xfeeefebb
    frame #0: 0x00aa47c8 test.exe`lws_sort_dns(wsi=<unavailable>, result=<unavailable>) at sort-dns.c:625:27
Process 2896 launched: 'C:\Users\dev\test.exe' (i686)
(lldb) bt
* thread #1, stop reason = Exception 0xc0000005 encountered at address 0xaa47c8: Access violation reading location 0xfeeefebb
  * frame #0: 0x00aa47c8 test.exe`lws_sort_dns(wsi=<unavailable>, result=<unavailable>) at sort-dns.c:625:27
    frame #1: 0x00a3520b test.exe`lws_client_connect_3_connect(wsi=<unavailable>, ads=<unavailable>, result=0x06e88600, n=<unavailable>, opaque=<unavailable>) at connect3.c:175:3
    frame #2: 0x00984681 test.exe`lws_async_dns_query(context=0x06346160, tsi=0, name=<unavailable>, qtype=LWS_ADNS_RECORD_A, cb=(test.exe`lws_client_connect_3_connect at connect3.c:141), wsi=0x06e88600, opaque=0x00000000, pq=0x00000000) at async-dns.c:726:7
    frame #3: 0x0099304d test.exe`lws_client_connect_2_dnsreq(wsi=<unavailable>) at connect2.c:366:7
    frame #4: 0x007fc10e test.exe`lws_http_client_connect_via_info2(wsi=<unavailable>) at connect.c:71:9
    frame #5: 0x007fc8c6 test.exe`lws_client_connect_via_info(i=<unavailable>) at connect.c:511:9
lws-team commented 2 years ago

I can not catch this error with CURL with domain https://a-0003.a-msedge.net

It means curl is being the client instead of edge does not create this problem?

I have no problem without ASYNC dns

Yes, but async DNS... is async. So the ordering of things is naturally different.

lws_wsi_close

This is with the LWS_TO_KILL_ASYNC option?

if there is message PENDING_TIMEOUT_KILLED_BY_PROXY_CLIENT_CLOSE after 3 secs.

... this message is coming if we closed an HTTP client connection and handled it in the dummy callback

lws-team commented 2 years ago

This error come from lws_client_connect_via_info, in this step the WSI is not connected yet?

Yes the wsi is not connected yet. It's still doing the DNS lookup during this time, before even trying to make the actual connection.

It might go on to the actual connection (in this case, the DNS sort at the beginning of that) in two ways, immediate fail of DNS lookup (eg, no server listed), the other way is that a previous lookup has its results cached, so it can respond with the DNS immediately without having to go back to the event loop.

calvin2021y commented 2 years ago

It means curl is being the client instead of edge does not create this problem?

Yes, I try all error domain on CURL and there is no problem.

calvin2021y commented 2 years ago

This is with the LWS_TO_KILL_ASYNC option?

lws_wsi_close(wsi, LWS_TO_KILL_ASYNC);

calvin2021y commented 2 years ago

Is there a way to skip lws DNS cache? I think my local DNS cache work and secend request response verfy fast. so I dont think I need LWS local DNS cache.

When I start the test app, then open edge, then close edge. then open edage again the error will come out.

lws-team commented 2 years ago

In the general case, local resolver should cache it for the TTL in the result... a lot of people just go straight to their ISP cache set up by DHCP.

This is dragging on because I can't reproduce it, if the cache is a clue then the right way is add this kind of tests to the api-test and see if the problem can visit me... if so I can solve it quickly.

calvin2021y commented 2 years ago

If there is a option to disable DNS cache to test then I think it can narrow down where cause the issue.

lws-team commented 2 years ago

No... but according to this just reading twice will let me reproduce it, so let's see.

calvin2021y commented 2 years ago

This DNS record don't have a address record, so there is no ttl to cache it.

The authority record has TTL to 30 sec, and authority ttl not always same like address record ttl value, I am not sure which ttl value used by LWS in this case.