google / sanitizers

AddressSanitizer, ThreadSanitizer, MemorySanitizer
Other
11.36k stars 1.02k forks source link

Illegal instruction after enabling ASan or TSan. #997

Open Rashpil93 opened 6 years ago

Rashpil93 commented 6 years ago

After enabling Asan or Tsan, my app failed with SIGILL. In app used DPDK with hugepages.

After testing, it turned out that the application fail with SIGILL for memory allocated in mbuf pool. If a variable from the stack is assigned a value from arp_hdr->arp_data.arp_tip and pass a pointer to this variable in function ip_format_addr, then the application will work.

Why this happen? How to solve this problem?

Core dump ``` Failed to read a valid object file image from memory. Core was generated by helloworld -l 0-3 -n 1. Program terminated with signal SIGILL, Illegal instruction. #0 0x000000000044611b in ip_format_addr ( buf=, size=, ip=) at main.c:64 64 ((uint32_t)(*ip & 0xff000000) >> 24) [Current thread is 1 (LWP 467)] ```
Debug dump ``` ASAN_OPTIONS=verbosity=1::disable_coredump=0::unmap_shadow_on_exit=1 helloworld -l 0-3 -n 1 ==390==Parsed ASAN_OPTIONS: verbosity=1::disable_coredump=0::unmap_shadow_on_exit=1 ==390==AddressSanitizer: failed to intercept '__isoc99_printf' ==390==AddressSanitizer: failed to intercept '__isoc99_sprintf' ==390==AddressSanitizer: failed to intercept '__isoc99_snprintf' ==390==AddressSanitizer: failed to intercept '__isoc99_fprintf' ==390==AddressSanitizer: failed to intercept '__isoc99_vprintf' ==390==AddressSanitizer: failed to intercept '__isoc99_vsprintf' ==390==AddressSanitizer: failed to intercept '__isoc99_vsnprintf' ==390==AddressSanitizer: failed to intercept '__isoc99_vfprintf' ==390==AddressSanitizer: failed to intercept 'memcmp' ==390==AddressSanitizer: libc interceptors initialized || `[0x10007fff8000, 0x7fffffffffff]` || HighMem || || `[0x02008fff7000, 0x10007fff7fff]` || HighShadow || || `[0x00008fff7000, 0x02008fff6fff]` || ShadowGap || || `[0x00007fff8000, 0x00008fff6fff]` || LowShadow || || `[0x000000000000, 0x00007fff7fff]` || LowMem || MemToShadow(shadow): 0x00008fff7000 0x000091ff6dff 0x004091ff6e00 0x02008fff6fff redzone=16 max_redzone=2048 quarantine_size=256M malloc_context_size=30 SHADOW_SCALE: 3 SHADOW_GRANULARITY: 8 SHADOW_OFFSET: 7fff8000 ==390==Installed the sigaction for signal 11 ==390==T0: stack [0x7ffe6d5f5000,0x7ffe6ddf5000) size 0x800000; local=0x7ffe6ddf439c ==390==AddressSanitizer Init done EAL: Detected 4 lcore(s) EAL: Probing VFIO support... ==390==T4: stack [0x7f396dde2000,0x7f396e5e1dc0) size 0x7ffdc0; local=0x7f396e5e1cec ==390==T3: stack [0x7f396e5e3000,0x7f396ede2dc0) size 0x7ffdc0; local=0x7f396ede2cec ==390==T2: stack [0x7f396ede4000,0x7f396f5e3dc0) size 0x7ffdc0; local=0x7f396f5e3cec EAL: PCI device 0000:01:00.0 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 8086:1572 net_i40e PMD: Global register is changed during enable FDIR flexible payload PMD: Global register is changed during support QinQ parser PMD: Global register is changed during configure hash input set PMD: Global register is changed during configure fdir mask PMD: Global register is changed during configure hash mask ==390==T1: stack [0x7f396f5e5000,0x7f396fde4dc0) size 0x7ffdc0; local=0x7f396fde4cec PMD: Global register is changed during support QinQ cloud filter PMD: Global register is changed during support TPID configuration EAL: PCI device 0000:01:00.1 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 8086:1572 net_i40e PMD: Global register is changed during enable FDIR flexible payload PMD: Global register is changed during support QinQ parser PMD: Global register is changed during configure hash input set PMD: Global register is changed during configure fdir mask PMD: Global register is changed during configure hash mask PMD: Global register is changed during support QinQ cloud filter PMD: Global register is changed during support TPID configuration EAL: PCI device 0000:01:00.2 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 8086:1572 net_i40e PMD: Global register is changed during enable FDIR flexible payload PMD: Global register is changed during support QinQ parser PMD: Global register is changed during configure hash input set PMD: Global register is changed during configure fdir mask PMD: Global register is changed during configure hash mask PMD: Global register is changed during support QinQ cloud filter PMD: Global register is changed during support TPID configuration EAL: PCI device 0000:01:00.3 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 8086:1572 net_i40e PMD: Global register is changed during enable FDIR flexible payload PMD: Global register is changed during support QinQ parser PMD: Global register is changed during configure hash input set PMD: Global register is changed during configure fdir mask PMD: Global register is changed during configure hash mask PMD: Global register is changed during support QinQ cloud filter PMD: Global register is changed during support TPID configuration hello from core 1 ip addr 48.152.125.92 hello from core 2 ip addr 48.152.125.92 hello from core 3 ip addr 48.152.125.92 hello from core 0 ip addr 48.152.125.92 Illegal instruction ```
App code ```c static struct rte_mempool *mbuf_pool; void __attribute__ ((noinline)) ip_format_addr(char *buf, uint16_t size, uint32_t *ip) { snprintf(buf, size, "%d.%d.%d.%d", ((uint32_t)(*ip & 0xff)), ((uint32_t)(*ip & 0x0000ff00) >> 8), ((uint32_t)(*ip & 0x00ff0000) >> 16), ((uint32_t)(*ip & 0xff000000) >> 24) ); } static int lcore_hello(__attribute__((unused)) void *arg) { unsigned lcore_id; uint32_t * ip; char buf[32]; struct rte_mbuf *created_pkt; struct ether_hdr *eth_hdr; struct arp_hdr *arp_hdr; size_t pkt_size; lcore_id = rte_lcore_id(); printf("hello from core %u\n", lcore_id); created_pkt = rte_pktmbuf_alloc(mbuf_pool); if (created_pkt == NULL) { printf("Failed to allocate mbuf\n"); return -1; } pkt_size = sizeof(struct ether_hdr) + sizeof(struct arp_hdr); created_pkt->data_len = pkt_size; created_pkt->pkt_len = pkt_size; eth_hdr = rte_pktmbuf_mtod(created_pkt, struct ether_hdr *); eth_hdr->ether_type = rte_cpu_to_be_16(ETHER_TYPE_ARP); arp_hdr = (struct arp_hdr *)((char *)eth_hdr + sizeof(struct ether_hdr)); arp_hdr->arp_hrd = rte_cpu_to_be_16(ARP_HRD_ETHER); arp_hdr->arp_pro = rte_cpu_to_be_16(ETHER_TYPE_IPv4); arp_hdr->arp_hln = ETHER_ADDR_LEN; arp_hdr->arp_pln = sizeof(uint32_t); arp_hdr->arp_op = rte_cpu_to_be_16(ARP_OP_REQUEST); arp_hdr->arp_data.arp_sip = 1551734841; memset(&arp_hdr->arp_data.arp_tha, 0, ETHER_ADDR_LEN); arp_hdr->arp_data.arp_tip = 1551734842; ip = rte_malloc(NULL, sizeof(uint32_t), 0); if (ip == NULL) { printf("Failed to allocate ip\n"); return -1; } *ip = 1551734832; ip_format_addr(buf, 32, ip); printf("ip addr %s\n", buf); ip_format_addr(buf, 32, &arp_hdr->arp_data.arp_tip); printf("ip addr tip %s\n", buf); ip_format_addr(buf, 32, &arp_hdr->arp_data.arp_sip); printf("ip addr sip %s\n", buf); return 0; } int main(int argc, char **argv) { int ret; unsigned lcore_id; ret = rte_eal_init(argc, argv); if (ret < 0) rte_panic("Cannot init EAL\n"); mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL", NB_MBUF, 32, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id()); /* call lcore_hello() on every slave lcore */ RTE_LCORE_FOREACH_SLAVE(lcore_id) { rte_eal_remote_launch(lcore_hello, NULL, lcore_id); } /* call it on master lcore too */ lcore_hello(NULL); rte_eal_mp_wait_lcore(); return 0; } ```
dvyukov commented 6 years ago

Hi Rashpil,

Does this library mmap memory at fixed addresses? Please share /proc/$PID/maps contents at the time of crash.

Rashpil93 commented 6 years ago

Hi, Dmitry, DPDK used mmap memory MAP_PRIVATE and MAP_ANONYMOUS for hugepages.

map.txt

dvyukov commented 6 years ago

These 100100000000-100100200000 rw-s 00000000 00:11 65809 /mnt/huge/rtemap_0 are probably mapped at fixed addresses and conflict with asan/tsan expectations for virtual address space layout. If the library allows changing the fixed address (perhaps some env var), then it can help. For tsan we have expectations about address space here: https://github.com/llvm-mirror/compiler-rt/blob/master/lib/tsan/rtl/tsan_platform.h#L31 Do we have something similar for asan? But try various addresses, something should work.

Rashpil93 commented 5 years ago

I was able to change fixed addresses for the DPDK but my app failed with SIGILL. :( Address space for asan: https://github.com/llvm-mirror/compiler-rt/blob/master/lib/asan/asan_allocator.h#L124

dvyukov commented 5 years ago

What address and tool did you use?

Unless somebody else sees the problem from the provided info, a reproducer would be useful.

Rashpil93 commented 5 years ago

In DPPK, you can change the address using the --base-virtaddr option in EAL. I used addresses from 0x200000000000 to 0x700000000000

Rashpil93 commented 5 years ago

If add the no_sanitize_address attribute, the application will still fail with SIGILL.

Rashpil93 commented 5 years ago

Address space for asan: https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit

Rashpil93 commented 5 years ago

Perhaps there are other solutions to this problem?

dvyukov commented 5 years ago

It's still unclear to me why changing address does not work. A reproducer would help sanitizer developers to understand the problem better and hopefully propose some solution.

Rashpil93 commented 5 years ago

Hi. Do you need any debug output or an application that repeats an error?

Debug ``` ASAN_OPTIONS=verbosity=1 test_ill --base-virtaddr=0x400000000000 --log-level=8 ==571==Parsed ASAN_OPTIONS: verbosity=1 ==571==AddressSanitizer: failed to intercept '__isoc99_printf' ==571==AddressSanitizer: failed to intercept '__isoc99_sprintf' ==571==AddressSanitizer: failed to intercept '__isoc99_snprintf' ==571==AddressSanitizer: failed to intercept '__isoc99_fprintf' ==571==AddressSanitizer: failed to intercept '__isoc99_vprintf' ==571==AddressSanitizer: failed to intercept '__isoc99_vsprintf' ==571==AddressSanitizer: failed to intercept '__isoc99_vsnprintf' ==571==AddressSanitizer: failed to intercept '__isoc99_vfprintf' ==571==AddressSanitizer: failed to intercept 'memcmp' ==571==AddressSanitizer: libc interceptors initialized || `[0x10007fff8000, 0x7fffffffffff]` || HighMem || || `[0x02008fff7000, 0x10007fff7fff]` || HighShadow || || `[0x00008fff7000, 0x02008fff6fff]` || ShadowGap || || `[0x00007fff8000, 0x00008fff6fff]` || LowShadow || || `[0x000000000000, 0x00007fff7fff]` || LowMem || MemToShadow(shadow): 0x00008fff7000 0x000091ff6dff 0x004091ff6e00 0x02008fff6fff redzone=16 max_redzone=2048 quarantine_size=256M malloc_context_size=30 SHADOW_SCALE: 3 SHADOW_GRANULARITY: 8 SHADOW_OFFSET: 7fff8000 ==571==Installed the sigaction for signal 11 ==571==T0: stack [0x7ffdfdcfd000,0x7ffdfe4fd000) size 0x800000; local=0x7ffdfe4fc59c ==571==AddressSanitizer Init done EAL: Detected lcore 0 as core 0 on socket 0 EAL: Detected lcore 1 as core 0 on socket 0 EAL: Detected lcore 2 as core 0 on socket 0 EAL: Detected lcore 3 as core 0 on socket 0 EAL: Support maximum 128 logical core(s) by configuration. EAL: Detected 4 lcore(s) EAL: Module /sys/module/vfio_pci not found! error 2 (No such file or directory) EAL: VFIO PCI modules not loaded EAL: Probing VFIO support... EAL: Module /sys/module/vfio not found! error 2 (No such file or directory) EAL: VFIO modules not loaded, skipping VFIO support... EAL: Module /sys/module/vfio not found! error 2 (No such file or directory) EAL: Setting up physically contiguous memory... EAL: Trying to obtain current memory policy. EAL: Hugepage /mnt/huge/rtemap_0 is on socket 0 EAL: Ask a virtual area of 0x40000000 bytes EAL: Virtual area found at 0x400000000000 (size = 0x40000000) EAL: Requesting 1 pages of size 1024MB from socket 0 EAL: TSC frequency is ~3600032 KHz EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles ! EAL: Master lcore 0 is ready (tid=795fe940;cpuset=[0]) ==571==T4: stack [0x7f7a729fc000,0x7f7a731fbdc0) size 0x7ffdc0; local=0x7f7a731fbcec ==571==T3: stack [0x7f7a731fd000,0x7f7a739fcdc0) size 0x7ffdc0; local=0x7f7a739fccec ==571==T2: stack [0x7f7a739fe000,0x7f7a741fddc0) size 0x7ffdc0; local=0x7f7a741fdcec EAL: lcore 2 is ready (tid=739fd700;cpuset=[2]) ==571==T1: stack [0x7f7a741ff000,0x7f7a749fedc0) size 0x7ffdc0; local=0x7f7a749fecec EAL: lcore 1 is ready (tid=741fe700;cpuset=[1]) EAL: lcore 3 is ready (tid=731fc700;cpuset=[3]) EAL: PCI device 0000:00:03.0 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 8086:100e net_e1000_em EAL: Not managed by a supported kernel driver, skipped EAL: PCI device 0000:00:04.0 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 8086:100e net_e1000_em EAL: Not managed by a supported kernel driver, skipped EAL: PCI device 0000:00:05.0 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 8086:100e net_e1000_em EAL: Not managed by a supported kernel driver, skipped hello from core 1 Illegal instruction ```