zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.52k stars 6.45k forks source link

Raw Socket Failure when using 2 Raw Sockets and zsock_select() statement - improper mapping from sock to handlers #38376

Closed bob2oneil closed 2 years ago

bob2oneil commented 3 years ago

Describe the bug I have a raw socket implementation on 2 virtual Ethernet interfaces using the Native-POSIX board target. The raw socket associated with the standard Ethernet interface, which also includes standard UDP and TCP listen sockets, no longer works properly with the latest Zephyr build as of Sept 7th, but I have observed this issue in previous versions.

What appears to happen is as shown in the callstack of the attached screenshot terminating in the zsock_recv_ctx() API for the failed raw socket read. This API does not support reads for raw sockets, so it would appear that perhaps the kernel configuration or runtime application settings is causing execution paths that are not expected?

The callstacks for the working and non-working raw socket interfaces shown in the following callstack excerpts between working and non-working implementations on different interfaces. It is clear that for the working raw socket, the 1st 2 lines differ from the non-working, so the callstack is simply different for 2 raw sockets. The POSIX read API as zsock_recv() goes to different flows for these two interfaces.

Specifically, the working L2 socket, calls zpacket APIs, while the non-working L2 socket calls zsock APIs.

Debugging involved walking into the POSIX recv() API for the working raw socket interface versus the non-working one to produce the callstacks.

To Reproduce Steps to reproduce the behavior:

  1. west build
  2. target Native-POSIX board
  3. make

Expected behavior Properly call stack routed behavior for raw socket on both virtual interfaces.

Impact Showstopper relative to raw socket operation.

Logs and console output

Working callstack for raw socket:

zpacket_recvfrom_ctx(struct net_context * ctx, void * buf, size_t max_len, int flags, struct sockaddr * src_addr, socklen_t * addrlen) (/home/robocom/zephyrproject/zephyr/subsys/net/lib/sockets/sockets_packet.c:202)
packet_sock_recvfrom_vmeth(void * obj, void * buf, size_t max_len, int flags, struct sockaddr * src_addr, socklen_t * addrlen) (/home/robocom/zephyrproject/zephyr/subsys/net/lib/sockets/sockets_packet.c:347)
z_impl_zsock_recvfrom(int sock, void * buf, size_t max_len, int flags, struct sockaddr * src_addr, socklen_t * addrlen) (/home/robocom/zephyrproject/zephyr/subsys/net/lib/sockets/sockets.c:1251)
zsock_recvfrom(socklen_t * addrlen, sockaddr * src_addr, int flags, size_t max_len, void * buf, int sock) (/home/robocom/us/NetworkRed/software/redparrot/build/zephyr/include/generated/syscalls/socket.h:217)
zsock_recv(int flags, size_t max_len, void * buf, int sock) (/home/robocom/zephyrproject/zephyr/include/net/socket.h:404)
App::receivePacket(const char * ifName, const int sock, unsigned char * pkt, int pktSize, int & size) 

Failed callstack for non-working raw socket:

zsock_recvfrom_ctx(struct net_context * ctx, void * buf, size_t max_len, int flags, struct sockaddr * src_addr, socklen_t * addrlen) (/home/robocom/zephyrproject/zephyr/subsys/net/lib/sockets/sockets.c:1231)
sock_recvfrom_vmeth(void * obj, void * buf, size_t max_len, int flags, struct sockaddr * src_addr, socklen_t * addrlen) (/home/robocom/zephyrproject/zephyr/subsys/net/lib/sockets/sockets.c:2166)
z_impl_zsock_recvfrom(int sock, void * buf, size_t max_len, int flags, struct sockaddr * src_addr, socklen_t * addrlen) (/home/robocom/zephyrproject/zephyr/subsys/net/lib/sockets/sockets.c:1251)
zsock_recvfrom(socklen_t * addrlen, sockaddr * src_addr, int flags, size_t max_len, void * buf, int sock) (/home/robocom/us/NetworkRed/software/redparrot/build/zephyr/include/generated/syscalls/socket.h:217)
zsock_recv(int flags, size_t max_len, void * buf, int sock) (/home/robocom/zephyrproject/zephyr/include/net/socket.h:404)
App::receivePacket(const char * ifName, const int sock, unsigned char * pkt, int pktSize, int & size) (/home/robocom/us/NetworkRed/software/redparrot/src/app/app.cpp:608)
App::onWiredRawSocketReceive() 

ZephyrRawSocketReadFailure2 ZephyrRawSocketReadFailure1

Environment (please complete the following information):

Additional context Kernel configuration is as follows:

# C++ support
CONFIG_CPLUSPLUS=y
CONFIG_LIB_CPLUSPLUS=y
# CONFIG_STD_CPP2A=y
# CONFIG_CBPRINTF_FP_SUPPORT=y

# This setting in necessary for QEMU-RISCV, but will
# default to 'n' or external for NATIVE-POSIX
CONFIG_NEWLIB_LIBC=y

# General kernel options
CONFIG_INIT_STACKS=y

# fs_dirent structures are big
CONFIG_MAIN_STACK_SIZE=2048

# Size of heap memory pool
CONFIG_HEAP_MEM_POOL_SIZE=16384

# Co-operative threading disabled, want pre-emption
# A cooperative thread remains the current thread until
# it performs an action that makes it unready
CONFIG_NET_TC_THREAD_COOPERATIVE=n

# Base 64 conversions support for XML interface
CONFIG_BASE64=y

# Flash storage settings
# Optionally force the file system to be recreated
CONFIG_FLASH=y
CONFIG_FLASH_MAP=y
CONFIG_FLASH_PAGE_LAYOUT=y
CONFIG_FILE_SYSTEM=y
CONFIG_FILE_SYSTEM_LITTLEFS=y
CONFIG_FS_LOG_LEVEL_DBG=y
# CONFIG_APP_WIPE_STORAGE=y

# Networking config
# Need TCP for Telnet session for shell interface
CONFIG_NETWORKING=y
CONFIG_NET_UDP=y
CONFIG_NET_TCP=y
CONFIG_NET_IPV6=n
CONFIG_NET_IPV4=y
CONFIG_NET_DHCPV4=n
CONFIG_NET_ARP=y
CONFIG_NET_NATIVE=y
CONFIG_NET_L2_ETHERNET=y
CONFIG_NET_DISABLE_ICMP_DESTINATION_UNREACHABLE=y

# Network drivers buffers
CONFIG_NET_PKT_RX_COUNT=640
CONFIG_NET_PKT_TX_COUNT=640
CONFIG_NET_BUF_RX_COUNT=640
CONFIG_NET_BUF_TX_COUNT=640

# IP Address ranges
CONFIG_NET_IF_UNICAST_IPV4_ADDR_COUNT=258
CONFIG_NET_IF_MCAST_IPV4_ADDR_COUNT=0
CONFIG_NET_MAX_CONTEXTS=10

# To support prj.conf based IP setup
# We will be assigning IPs dynamically runtime
# so will not use the configuration based approach
CONFIG_NET_CONFIG_SETTINGS=n
CONFIG_NET_CONFIG_NEED_IPV6=n
CONFIG_NET_CONFIG_NEED_IPV4=n

# IP address options, leave blank so can be assigned runtime
# Primary interface IP and peer IP
CONFIG_NET_CONFIG_MY_IPV4_ADDR=""
CONFIG_NET_CONFIG_PEER_IPV4_ADDR=""

# Network interface statistics logging
CONFIG_NET_STATISTICS=y
CONFIG_NET_STATISTICS_USER_API=n
CONFIG_NET_STATISTICS_PERIODIC_OUTPUT=y
CONFIG_NET_STATISTICS_IPV4=y
CONFIG_NET_STATISTICS_IPV6=n
CONFIG_NET_STATISTICS_IPV6_ND=n
CONFIG_NET_STATISTICS_ICMP=y
CONFIG_NET_STATISTICS_UDP=y
CONFIG_NET_STATISTICS_TCP=y
CONFIG_NET_STATISTICS_MLD=n
CONFIG_NET_STATISTICS_ETHERNET=y
CONFIG_NET_STATISTICS_ETHERNET_VENDOR=n
CONFIG_NET_STATISTICS_LOG_LEVEL_DBG=y
CONFIG_NET_STATISTICS_PER_INTERFACE=y

# Enable promiscuous mode support on sockets for L2 access
# Not necessary for raw socket interface for IPs assigned
# to the primary wired interface, but needed for the
# wireless simulated RF network
CONFIG_NET_PROMISCUOUS_MODE=y

# Ethernet interface native POSIX
CONFIG_ETH_NATIVE_POSIX=y

# Random MAC is required for more than 1 POSIX interfaces
CONFIG_ETH_NATIVE_POSIX_RANDOM_MAC=y

# POSIX two interfaces, wired and wireless
CONFIG_ETH_NATIVE_POSIX_INTERFACE_COUNT=2

# Enable BSD sockets but don't use POSIX names for clashes with build machine
CONFIG_NET_SOCKETS=y
CONFIG_NET_SOCKETS_POSIX_NAMES=n

# Maximum file descriptors
CONFIG_POSIX_MAX_FDS=6

# Packet raw socket configuration to allow for raw sockets bound to iface
CONFIG_NET_SOCKETS_PACKET=y

# Random number generator
CONFIG_ENTROPY_GENERATOR=y
CONFIG_TEST_RANDOM_GENERATOR=y

# Logging
CONFIG_NET_LOG=y
CONFIG_LOG=y
CONFIG_LOG_MODE_DEFERRED=n
CONFIG_LOG_RUNTIME_FILTERING=y
# CONFIG_LOG_BUFFER_SIZE=2048
CONFIG_PRINTK=y

# - 0 OFF, do not write by default
# - 1 ERROR, default to only write LOG_LEVEL_ERR
# - 2 WARNING, default to write LOG_LEVEL_WRN
# - 3 INFO, default to write LOG_LEVEL_INFO
# - 4 DEBUG, default to write LOG_LEVEL_DBG
CONFIG_LOG_DEFAULT_LEVEL=3

# Prefix log output with function name
CONFIG_LOG_FUNC_NAME_PREFIX_ERR=y
CONFIG_LOG_FUNC_NAME_PREFIX_WRN=y
CONFIG_LOG_FUNC_NAME_PREFIX_INF=y
CONFIG_LOG_FUNC_NAME_PREFIX_DBG=y

# Disable logging using '%s' where strdup()s occur
# Only applies to logging using separate thread
# CONFIG_LOG_DETECT_MISSED_STRDUP=n

# Additions to the Shell/Console Interface
CONFIG_SHELL_STACK_SIZE=4096
CONFIG_SHELL_CMDS=y
CONFIG_SHELL_TAB_AUTOCOMPLETION=y
CONFIG_SHELL_HISTORY=y
CONFIG_NET_SHELL=y
CONFIG_KERNEL_SHELL=y
CONFIG_NET_STATISTICS=y
CONFIG_DEVICE_SHELL=y

# Shell and logging device backends
CONFIG_SHELL_LOG_BACKEND=n
CONFIG_SHELL_BACKEND_TELNET=y
CONFIG_LOG_BACKEND_UART=y

# Debug Options for debugging
CONFIG_DEBUG=y
CONFIG_DEBUG_OPTIMIZATIONS=y

# Memory buffer leak verification shell extensions
CONFIG_NET_BUF_POOL_USAGE=y
CONFIG_NET_DEBUG_NET_PKT_ALLOC=y

# File System Shell
CONFIG_FILE_SYSTEM=y
CONFIG_FILE_SYSTEM_SHELL=y

# Gather network statistics
CONFIG_NET_STATISTICS_ETHERNET=y

# Name threads
CONFIG_THREAD_NAME=y

# Max number of IPv4 network interfaces in the system
# Will be a value of 1 for real radio
CONFIG_NET_IF_MAX_IPV4_COUNT=2

# How many Tx traffic classes to have for each network device
CONFIG_NET_TC_TX_COUNT=1

# How many network connections is supported (default is 4)
CONFIG_NET_MAX_CONN=8

# Total number of sockets to poll in select()
CONFIG_NET_SOCKETS_POLL_MAX=5

# Enable mostly-standards-compliant implementations of various POSIX (IEEE 1003.1) APIs.
# Necessary to support clock_gettime() for QEMU target
CONFIG_POSIX_API=y

Creation of socket method

int App::createSocket(int &sock, struct net_if *iface, const char *deviceName, bool ipOnly, bool rawSocket, int udpListenPort)
{
    // Create socket
    if (rawSocket)
    {
        // Create raw socket for any IP bound to iface
        // To receive all packets, the macro is ETH_P_ALL and to
        // receive IP packets, the macro is ETH_P_IP for the protocol field.
        sock = zsock_socket(AF_PACKET, SOCK_RAW, ipOnly ? ETH_P_IP : ETH_P_ALL);
    }
    else
    {
        // Create UDP socket
        sock = zsock_socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
    }

    if (sock < 0)
    {
        LOG_ERR("Failed to create socket: %d - %s", errno, strerror(errno));
        return -errno;
    }
    else
    {
        LOG_INF("Succeeded in creating socket %d", sock);
    }

    // NOTE: Kernel configuration NET_MAX_CONNS must be set to the total number
    // of socket connnections or else binding will fail.  This value defaults
    // to a value of 4.

    // Bind sockets
    if (rawSocket)
    {
        struct sockaddr_ll addr = {0};
        addr.sll_ifindex = net_if_get_by_iface(iface);
        addr.sll_family = AF_PACKET;

        // Bind socket to specified interface
        // NOTE: binding to iface for raw socket requires CONFIG_NET_SOCKETS_PACKET=y

        /*
         This is an initial version of packet socket support (special type raw socket)
         Packets are passed to and from the device driver without any changes in the packet
         headers.It's API caller responsibility to provide all the headers (e.g L2, L3 and so on)
         while sending. While receiving, packets (including all the headers) will be feed to sockets
         as it as from the driver.
        */
        LOG_DBG("Binding raw socket to interface %d (%p)", addr.sll_ifindex, net_if_get_by_index(addr.sll_ifindex));

        int ret = zsock_bind(sock, (const struct sockaddr *)&addr, sizeof(struct sockaddr_ll));
        if (ret < 0)
        {
            LOG_ERR("Failed to bind raw packet socket: %d - %s", errno, strerror(errno));
            return -errno;
        }
        else
        {
            LOG_INF("Succeeded in binding raw packet socket %d to iface %p", sock, iface);
        }
    }
    // UDP socket
    else
    {
        struct sockaddr_in addr;
        struct in_addr wiredAddr;

        // Convert wired interface IP as a string into a binary address
        net_addr_pton(AF_INET, m_wiredInterface.getIp(), &wiredAddr);

        memset((char *)&addr, 0, sizeof(addr));
        addr.sin_family = AF_INET;
        addr.sin_addr.s_addr = wiredAddr.s_addr; // Bind to source IP to this radio's wired interface

        // If listen port is specified, bind to it
        if (udpListenPort)
        {
            addr.sin_port = htons(udpListenPort); // Bind to listen destination port
        }
        // Other, do not bind to a listen port
        else
        {
            addr.sin_port = 0;
        }

        // Bind UDP socket to address and optionally listen port
        int ret = zsock_bind(sock, (const struct sockaddr *)&addr, sizeof(struct sockaddr_in));
        if (ret < 0)
        {
            LOG_ERR("Cannot bind UDP socket %d - %s", errno, strerror(errno));
            return -errno;
        }
        else
        {
            LOG_INF("Succeeded binding UDP socket %d to listen port %d", sock, udpListenPort);
        }

        // Bind socket to device if deviceName argument is provided
        if (*deviceName)
        {
            struct ifreq ifreq = {0};
            strcpy(ifreq.ifr_name, deviceName);
            int ret = zsock_setsockopt(sock, SOL_SOCKET, SO_BINDTODEVICE, &ifreq, sizeof(ifreq));

            if (ret < 0)
            {
                LOG_ERR("Cannot bind UDP socket to device %s (%d-%s)", deviceName, errno, strerror(errno));
                return -errno;
            }
            else
            {
                LOG_INF("Succeeded binding UDP socket %d to device %s", sock, deviceName);
            }
        } // end bind to device

    } // end UDP socket

    // Set socket option to not block.
    // select() may report a socket file descriptor as "ready for reading", while nevertheless a
    // subsequent read blocks. This could for example happen when data has arrived but upon examination has
    // wrong checksum and is discarded. There may be other circumstances in which a file descriptor is
    // spuriously reported as ready. Thus it may be safer to use O_NONBLOCK on sockets that should not block.

    // Set socket to non-blocking for the select
    setFdBlocking(sock, false);

    return 0;
}
cfriedt commented 3 years ago

@bob2oneil - it might help to reference a test case that can be easily reproducible. If a test case does not already exist, would you consider creating a PR with one?

It may not be possible to properly handle this issue without a minimal reproducible test, but that kind of test would typically be able to run in CI to avoid any future regressions.

cc @rlubos

bob2oneil commented 3 years ago

Thanks for the response Chris, a test case does not yet exist, but I can create a minimal one to explore this raw socket to implementation mapping issue. Would it be sufficient to attach say a ZIP compressed file to this issue for tracking, or is a Pull Request required? As I understand GIT PR, it is typically submitted with proposed code to incorporate into the repo, for which I do not presently have a solution, only an observed problem.

cfriedt commented 3 years ago

@bob2oneil - a PR would be ideal

bob2oneil commented 3 years ago

Doesn't a PR require change code that solves the problem? Currently, I have no idea as to why the 2 raw sockets, both associated with virtual Ethernet interfaces running on Native-POSIX, using the same create socket function, deviate when it comes to the mapping of the socket handle using POSIX networking APIs to the underlying implementation. That is, I do not fully understand the Zephyr networking stack architecture sufficient to provide a solution, I do not have years of experience using it nor domain knowledge in that area.

rlubos commented 3 years ago

Currently, I have no idea as to why the 2 raw sockets

That is, I do not fully understand the Zephyr networking stack architecture sufficient to provide a solution

It's not expected from you to provide a solution. A simple test case however (likely an extension of existing tests/net/socket/af_packet test suite) which allows to easily reproduce the issue would be a great starting point for us to find one. Ideally we could incorporate the test into the tree to avoid regressions in the future.

bob2oneil commented 3 years ago

Thanks Robert, I will create a test case and attempt to use the PR mechanism to submit it. Presumably I would submit an entire Zephyr snapshot where the additions made by myself would just be this test case?

rlubos commented 3 years ago

Presumably I would submit an entire Zephyr snapshot where the additions made by myself would just be this test case?

Yes, with git workflow you branch off from the main branch (which is the current "snapshot" of Zephyr), and then add your commits on top (which in this case would be an extra test). From such a branch you can create a PR, unless one mess with the commit history, the PR should only contain your changes. The process is actually pretty well explained in our documentation, please see: https://docs.zephyrproject.org/latest/contribute/#contribution-workflow

cfriedt commented 3 years ago

Also, @bob2oneil - if you use # followed by any PR or Issue number, you create an automatic github reference, which is handy, so in your PR, remember to do that (in the commit message or in the web interface) to create a reference back here.

carlescufi commented 2 years ago

@bob2oneil please provide the information required.

bob2oneil commented 2 years ago

I intend to submit a test case for this, have not had time to do that at the moment. It is still an important issue for me to resolve.

carlescufi commented 2 years ago

I intend to submit a test case for this, have not had time to do that at the moment. It is still an important issue for me to resolve.

@bob2oneil do you intend to pursue the resolution of this issue? Since it cannot be reproduced, you should either provide a test case or close the issue. Thanks!

hideogump commented 2 years ago

I have tested this out with the Zephyr release that dates back maybe a week, and I believe it is fixed. I will close this out and will revisit it again should testing indicate that a problem remains.

bob2oneil commented 2 years ago

I have tested this out with the Zephyr release that dates back maybe a week, and I believe it is fixed. I will close this out and will revisit it again should testing indicate that a problem remains.