zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.84k stars 6.6k forks source link

tests/net/socket/getaddrinfo tests too little #15193

Closed mike-scott closed 5 years ago

mike-scott commented 5 years ago

Describe the bug Assert logic in https://github.com/zephyrproject-rtos/zephyr/blob/master/tests/net/socket/getaddrinfo/src/main.c#L23 is backwards. You can run this test without access to a local dnsmasq server and it passes.

To Reproduce

  1. Make sure QEMU net-tools are started, but no local dnsmasq server is available.
  2. cd tests/net/socket/getaddrinfo
  3. mkdir build && cd build
  4. cmake -DBOARD=qemu_x86 ..
  5. make run
  6. test passes(?!?)

Expected behavior Test should fail if no local dnsmasq server is running

Impact Test is not working

Screenshots or console output

SeaBIOS (version rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org)
Booting from ROM..***** Booting Zephyr OS v1.14.0-rc3-297-g6fcf986e7775 *****
Running test suite socket_getaddrinfo
===================================================================
starting test - test_getaddrinfo_ok
PASS - test_getaddrinfo_ok
===================================================================
starting test - test_getaddrinfo_no_host
PASS - test_getaddrinfo_no_host
===================================================================
Test suite socket_getaddrinfo succeeded
===================================================================
PROJECT EXECUTION SUCCESSFUL

Environment (please complete the following information):

mike-scott commented 5 years ago

I have a feeling more of the socket tests have this same issue.

jukkar commented 5 years ago

Please note that all the programs under tests/net are suppose to be self contained, they are not suppose to connect to outside world. The programs in samples/net/ are different in this respect as those ones will need outside connectivity to function properly.

That said, I tried to run this test for native_posix and it fails with this message ASSERTION FAIL [net_if_get_link_addr(iface)->addr != ((void *)0)] @ zephyr/subsys/net/ip/net_if.c:2870 I will investigate what is going on here as the test is suppose to be run in native_posix according to its testcase.yaml file. I will also look what the test is actually doing or trying to do.

mike-scott commented 5 years ago

@jukkar can you try and run this test for qemu_x86? Without any setup of local dnsmasq it will PASS. Which is not correct. Take a look here at the assert checks: https://github.com/zephyrproject-rtos/zephyr/blob/master/tests/net/socket/getaddrinfo/src/main.c#L23

mike-scott commented 5 years ago

BTW, if you look at the asserts, there is clearly a bug there. Removing the bug tag doesn't change this. Are we to ship LTS with incorrect tests?

mike-scott commented 5 years ago

Ugh. Re-reading this. I guess this test is written as a failure? Why does this even exist? Samples would do compile testing for sanity.

This test is a waste of CPU cycles.

I'll close, but this is silly.

mike-scott commented 5 years ago

To clarify why this test is silly: The whole reason I started investigating this, is that an actual DNS query sent via the Socket-based APIs which is never returned by the server will hang the Zephyr device. We have an edge gateway setup where containers are loaded which handle NAT64, DNS64, joining BLE 6lowpan devices as well as OpenThread. This issue was identified when the DNS64 container spins up a bit later than the BLE joiner. The perfect example of this would be when the gateway software is updated and it requires a restart, but the IoT nodes are up and running at the time.

The K_FOREVER semaphore taken here and a bit lower are the cause of the hang: https://github.com/zephyrproject-rtos/zephyr/blob/master/subsys/net/lib/sockets/getaddrinfo.c#L119

This test has no real network setup, so DNS subsys lib returns EAI_CANCELLED when the query cannot be sent. This test checks exact that: that a query cannot be sent and nothing else.

jukkar commented 5 years ago

Lets not close this as there is an issue. As I said, I will investigate how to fix it.

jukkar commented 5 years ago

There are too may assignees so setting only one. @mike-scott, please set only one assignee at a time as otherwise it is not clear who will start to fix the issue.

mike-scott commented 5 years ago

@jukka Apologies. I'm too used to PRs where that spot is for reviews. Thank you for taking a look here and at the new bug I opened.

pfalcon commented 5 years ago

This test has no real network setup, so DNS subsys lib returns EAI_CANCELLED when the query cannot be sent. This test checks exact that: that a query cannot be sent and nothing else.

Well, it tests at least something, specifically what it could (easily) test. And for what it tested, the logic is correct, so updating the ticket title.