nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.94k stars 29.77k forks source link

v22.4.1 AND v18.14.2 report abortIncoming (node:_http_server:806:17) when upload very-big file (20G+) , but the tcp socket is NOT teardown #55944

Open navegador5 opened 3 days ago

navegador5 commented 3 days ago

Version

Node.js v22.4.1.

Platform

Linux dev 5.15.0-117-generic #127-Ubuntu SMP Fri Jul 5 20:13:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04 LTS
Release:        22.04
Codename:       jammy

# ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) 0
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 63498
max locked memory           (kbytes, -l) 2046520
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 63498
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

net.ipv4.tcp_abort_on_overflow = 0
net.ipv4.tcp_adv_win_scale = 1
net.ipv4.tcp_allowed_congestion_control = reno cubic
net.ipv4.tcp_app_win = 31
net.ipv4.tcp_autocorking = 1
net.ipv4.tcp_available_congestion_control = reno cubic
net.ipv4.tcp_available_ulp = espintcp mptcp tls
net.ipv4.tcp_base_mss = 1024
net.ipv4.tcp_challenge_ack_limit = 1000
net.ipv4.tcp_comp_sack_delay_ns = 1000000
net.ipv4.tcp_comp_sack_nr = 44
net.ipv4.tcp_comp_sack_slack_ns = 100000
net.ipv4.tcp_congestion_control = cubic
net.ipv4.tcp_dsack = 1
net.ipv4.tcp_early_demux = 1
net.ipv4.tcp_early_retrans = 3
net.ipv4.tcp_ecn = 2
net.ipv4.tcp_ecn_fallback = 1
net.ipv4.tcp_fack = 0
net.ipv4.tcp_fastopen = 1
net.ipv4.tcp_fastopen_blackhole_timeout_sec = 0
net.ipv4.tcp_fastopen_key = 3d6fb714-9821d678-15c2886c-1bbdd4b0
net.ipv4.tcp_fin_timeout = 60
net.ipv4.tcp_frto = 2
net.ipv4.tcp_fwmark_accept = 0
net.ipv4.tcp_invalid_ratelimit = 500
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_l3mdev_accept = 0
net.ipv4.tcp_limit_output_bytes = 1048576
net.ipv4.tcp_low_latency = 0
net.ipv4.tcp_max_orphans = 65536
net.ipv4.tcp_max_reordering = 300
net.ipv4.tcp_max_syn_backlog = 65536
net.ipv4.tcp_max_tw_buckets = 360000
net.ipv4.tcp_mem = 786432       2097152 26777216
net.ipv4.tcp_migrate_req = 0
net.ipv4.tcp_min_rtt_wlen = 300
net.ipv4.tcp_min_snd_mss = 48
net.ipv4.tcp_min_tso_segs = 2
net.ipv4.tcp_moderate_rcvbuf = 1
net.ipv4.tcp_mtu_probe_floor = 48
net.ipv4.tcp_mtu_probing = 0
net.ipv4.tcp_no_metrics_save = 0
net.ipv4.tcp_no_ssthresh_metrics_save = 1
net.ipv4.tcp_notsent_lowat = 4294967295
net.ipv4.tcp_orphan_retries = 0
net.ipv4.tcp_pacing_ca_ratio = 120
net.ipv4.tcp_pacing_ss_ratio = 200
net.ipv4.tcp_probe_interval = 600
net.ipv4.tcp_probe_threshold = 8
net.ipv4.tcp_recovery = 1
net.ipv4.tcp_reflect_tos = 0
net.ipv4.tcp_reordering = 3
net.ipv4.tcp_retrans_collapse = 1
net.ipv4.tcp_retries1 = 3
net.ipv4.tcp_retries2 = 15
net.ipv4.tcp_rfc1337 = 0
net.ipv4.tcp_rmem = 4096        16384   33554432
net.ipv4.tcp_rx_skb_cache = 0
net.ipv4.tcp_sack = 1
net.ipv4.tcp_slow_start_after_idle = 1
net.ipv4.tcp_stdurg = 0
net.ipv4.tcp_syn_retries = 6
net.ipv4.tcp_synack_retries = 5
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_thin_linear_timeouts = 0
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_tso_win_divisor = 3
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tx_skb_cache = 0
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_wmem = 4096        16384   33554432
net.ipv4.tcp_workaround_signed_windows = 0
net.mptcp.add_addr_timeout = 120
net.mptcp.allow_join_initial_addr_port = 1
net.mptcp.checksum_enabled = 0
net.mptcp.enabled = 1
net.mptcp.stale_loss_cnt = 4

Subsystem

No response

What steps will reproduce the bug?

  1. just using http.createServer to create a simple http server
  2. in chrome ,using a input then using fetch to POST the File Obj, your file MUST be large enough(20G+)
  3. on httpServer, pipe the req to a fs.createWriteStream("xxxxx")
  4. just wait, when the file uploaded to about 10~12G, you maybe can get 【abortIncoming (node:_http_server:806:17)】
  5. Although this is NOT 100% to reproduce, BUT try 2-3 times ,you will get this error
  6. IF you use other http-server (such as uWebsocket ) everything woked well

see below: `【request from client(chrome OR edge), client JUST use xmlhttp OR fetch to post a File object】 recv post { host: '192.168.1.140:65535', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0', 'content-length': '20981630881', accept: '/', 'accept-encoding': 'gzip, deflate', 'accept-language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6', batchseq: '0', 'cache-control': 'no-cache', 'content-type': 'application/octet-stream', name: 'paligemma-jax-paligemma-3b-pt-224-v1.tar.gz', origin: 'http://192.168.1.140:65535', pragma: 'no-cache', referer: 'http://192.168.1.140:65535/', size: '20981630881', type: 'application%2Fx-gzip', uiseq: 'events' } 传输20981630881总耗时314.401s 文件位于 /home/cs6666-upld-srv/file/2024-11-21T12:56:35.146Z::0::paligemma-jax-paligemma-3b-pt-224-v1.tar.gz [ false, Error: aborted at abortIncoming (node:_http_server:806:17) at socketOnClose (node:_http_server:800:3) at Socket.emit (node:events:532:35) at TCP. (node:net:339:12) { code: 'ECONNRESET' } ]

//------------【 after the Abort message , the server report it receive a second POST from client(chrome OR edge) BUT acturally //-------------】 recv post {
host: '192.168.1.140:65535', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0', 'content-length': '20981630881', accept: '/', 'accept-encoding': 'gzip, deflate', 'accept-language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6', batchseq: '0', 'cache-control': 'no-cache', 'content-type': 'application/octet-stream', name: 'paligemma-jax-paligemma-3b-pt-224-v1.tar.gz', origin: 'http://192.168.1.140:65535', pragma: 'no-cache', referer: 'http://192.168.1.140:65535/', size: '20981630881', type: 'application%2Fx-gzip', uiseq: 'events' }

top - 21:33:52 up 113 days, 12:51, 12 users, load average: 3.21, 3.11, 2.17 Tasks: 295 total, 1 running, 294 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.8 us, 2.0 sy, 0.0 ni, 76.3 id, 21.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 15988.4 total, 163.1 free, 15663.0 used, 162.4 buff/cache MiB Swap: 4096.0 total, 794.5 free, 3301.5 used. 42.8 avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                      

306856 root 20 0 19.1g 14.7g 7248 D 8.3 94.2 4:44.11 node --------------------【IT COST nearly all memory】

【----------------------------- client upload ONE but server received two POST (each post i will creat a new file) -> server (req,res ) handle triggered post
->abortIncoming (node:_http_server:806:17) -> server (req,res ) handle triggered post the tcp socket is same

ls -l file/

total 14874992 -rw-r--r-- 1 root root 12621333720 Nov 21 21:01 2024-11-21T12:56:35.146Z::0::paligemma-jax-paligemma-3b-pt-224-v1.tar.gz -rw-r--r-- 1 root root 2610647040 Nov 21 21:03 2024-11-21T13:01:49.563Z::0::paligemma-jax-paligemma-3b-pt-224-v1.tar.gz`

How often does it reproduce? Is there a required condition?

you need to upload a BIG-FILE (20G+) to triggered it

NOT always。

BUT high. (try 2-3 times)

What is the expected behavior? Why is that the expected behavior?

IF 'ECONNRESET' triggered, node should tear down the tcp-socket.

What do you see instead?

when 'ECONNRESET' triggered, node-js http-server still live, AND wrongly report recv NEW request

Additional information

it NOT 100% to trigger it. you maybe need try it on different machine FOR serveral times

navegador5 commented 3 days ago

[9809629.918039] Out of memory: Killed process 306856 (node) total-vm:21659964kB, anon-rss:15465044kB, file-rss:2268kB, shmem-rss:0kB, UID:0 pgtables:45924kB oom_score_adj:0