intel / asynch_mode_nginx

Other
210 stars 60 forks source link

Connect times out; nginx worker process goes into infinite loop #9

Closed alameth closed 5 years ago

alameth commented 5 years ago

As far as I can tell, everything in QAT, QAT Engine, and qat_contig_mem is working properly. All the tests pass; openssl speed returns delightfully fast numbers. nginx, however, won't accept connections: After reading the TLS Client Hello, the worker goes into a hard loop polling. From strace:

9295  accept4(11, {sa_family=AF_INET, sin_port=htons(40716), sin_addr=inet_addr("96.74.122.182")}, [16], SOCK_NONBLOCK) = 3
9295  epoll_ctl(13, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLRDHUP|EPOLLET, {u32=635871744, u64=140635045011968}}) = 0
9295  epoll_wait(13, {{EPOLLIN, {u32=635871744, u64=140635045011968}}}, 512, 60000) = 1
9295  recvfrom(3, "\26", 1, MSG_PEEK, NULL, NULL) = 1
9295  read(3, "\26\3\1\2\0\1\0\1\374\3\3\22\16}E\201)\234\327\235M\306%\fW\16\364|b\303(Z"..., 16709) = 517
9295  open("/dev/qat_contig_mem", O_RDWR) = 15
9295  ioctl(15, 0xc0209500, 0x7ffdbbf354c0) = 0
9295  mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_LOCKED, 15, 0xffff8808450e0000) = 0x7fe825e40000
9295  sched_yield()                     = 0
9295  sched_yield()                     = 0
9295  sched_yield()                     = 0
9295  sched_yield()                     = 0
9295  sched_yield()                     = 0

O.S. Debian 8 Kernel 3.16.51-3 Driver 1.6, from qatmux.l.2.6.0-60.tar.gz OpenSSL openssl-1.1.0j.tar.gz Engine and asynch_mode_nginx cloned from github

I am assuming I'm doing something fundamentally wrong. Thinking about trying a debug build next.

alameth commented 5 years ago
02:00.0 Co-processor: Intel Corporation Coleto Creek PCIe Endpoint
        Subsystem: Super Micro Computer Inc Device 0000
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 32
        Region 0: Memory at fa600000 (64-bit, prefetchable) [size=512K]
        Region 2: Memory at fba40000 (64-bit, non-prefetchable) [size=256K]
        Region 4: Memory at fba00000 (64-bit, non-prefetchable) [size=256K]
        Capabilities: <access denied>
        Kernel driver in use: qat_1_6_adf
alameth commented 5 years ago

I uninstalled 1.6 and installed 1.7. Same behavior, albeit with a slightly different look in strace because it's USDM now:

8736  epoll_wait(13, {{EPOLLIN, {u32=556367888, u64=140686505115664}}}, 512, 60000) = 1
8736  accept4(11, {sa_family=AF_INET, sin_port=htons(42642), sin_addr=inet_addr("96.74.122.182")}, [16], SOCK_NONBLOCK) = 15
8736  epoll_ctl(13, EPOLL_CTL_ADD, 15, {EPOLLIN|EPOLLRDHUP|EPOLLET, {u32=556368632, u64=140686505116408}}) = 0
8736  epoll_wait(13, {{EPOLLIN, {u32=556368384, u64=140686505116160}}}, 512, 59995) = 1
8736  recvfrom(3, "\26", 1, MSG_PEEK, NULL, NULL) = 1
8736  read(3, "\26\3\1\2\0\1\0\1\374\3\3\241Ob\2525\320\344\227X\2210\311\250\314\346\3\224\350\223B0"..., 16709) = 517
8736  ioctl(8, 0xc0507100, 0x7ffd3256bf90) = 0
8736  mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_LOCKED, 8, 0x834200000) = 0x7ff41de3e000
8736  madvise(0x7ff41de3e000, 2097152, MADV_DONTFORK) = 0
8736  sched_yield()                     = 0
8736  sched_yield()                     = 0
alameth commented 5 years ago

Stock nginx 1.14 works fine with engine QAT, just not very quickly (~4x faster than soft crypto). So I must be doing something wrong with the ngx_ssl_engine_qat_module.so.

daweiq commented 5 years ago

@alameth Sorry for late response! I am busy in new async-Nginx release.

The log showing the thread sleep after ioctl call to qat_contig_mem or USDM device. It might because the QATEngine is trying to allocate/free memory from huge-page. It will be great to have a debug version for this, so we could look at the backstrace to understand what happened inside.

I am confused, if the worker goes into infinite loop, why your process can handle new requests. May I understand in which polling mode, do you run the test? And how many thread do you have inside the worker? And in which thread this issue happened?

Thank you, David Qian

daweiq commented 5 years ago

Close it, as no further info.

alameth commented 5 years ago

I was removed from this project, so no longer have the ability to provide additional info, sorry.

jiangzhuti commented 4 years ago

https://github.com/intel/asynch_mode_nginx/issues/18 https://github.com/intel/QAT_Engine/issues/100

我猜是同一个问题