Open ollm opened 2 years ago
I fixed this by reverting this commit: reactphp/stream@3eb342d
@acadjsr Reverting to an old version doesn't seem sensible at this point as the previous version had other known issues that are in turn known to be fixed in the current version (see above #939 (comment)).
I missed to mention that I reverted only this file: WritableResourceStream.php So I'm not having this issue for 4 weeks.
We made the changes as suggested by @acadjsr and this is the first time we haven't had to restart the server in over 24 hours. So the problem is definitely to be found there
@fx1234 @acadjsr what do you see in system logs when high CPU usage happens?
One of my dev server that runs rachet somehow started to have high CPU usage (50%, 1 core out of two completely used), and high ram usage (near 100%), not responding to remote SSH connection request. In kern.log I see an entry with many NULL with timestamp cooresponding to high usage of CPU/RAM.
At this point I am not sure this is caused by rachet, but rachet is the only service on this server. Is there any way to rule out?
Thanks.
I don't know. It did not happen even once in last 5 months in production after the solution: https://github.com/ratchetphp/Ratchet/issues/939#issuecomment-1132159140
@acadjsr Thanks for the information. I think I too will apply the fix before putting it into production.
When high CPU issue happened, were you able to log in via SSH? Were other service such as Apache still responsive?
The same for me as for @acadjsr. I think the PHP process was at 100%, but I can't tell exactly
@acadjsr Thanks for the information. I think I too will apply the fix before putting it into production.
When high CPU issue happened, were you able to log in via SSH? Were other service such as Apache still responsive?
I was able to login in via SSH. Other services are still responsive.
I'm really struggling with this issue too. Randomly (it seems) every 3 days to a month PHP consumes an entire CPU core (running Ratchet/ReactPHP). strace shows:
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=995813, si_uid=1000} ---
pselect6(91, [5 7 15 36 45], [90], [], {tv_sec=29, tv_nsec=604598000}, NULL) = 1 (out [90], left {tv_sec=29, tv_nsec=604596023})
write(90, "\313\301\20$\226RuKn\263\241\336\30\234\2271\177\220K\21\v\340\245<r\7\355\261\237\26\rG"..., 3190) = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=995813, si_uid=1000} ---
pselect6(91, [5 7 15 36 45], [90], [], {tv_sec=29, tv_nsec=604355000}, NULL) = 1 (out [90], left {tv_sec=29, tv_nsec=604353252})
write(90, "\313\301\20$\226RuKn\263\241\336\30\234\2271\177\220K\21\v\340\245<r\7\355\261\237\26\rG"..., 3190) = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=995813, si_uid=1000} ---
pselect6(91, [5 7 15 36 45], [90], [], {tv_sec=29, tv_nsec=604117000}, NULL) = 1 (out [90], left {tv_sec=29, tv_nsec=604115286})
write(90, "\313\301\20$\226RuKn\263\241\336\30\234\2271\177\220K\21\v\340\245<r\7\355\261\237\26\rG"..., 3190) = -1 EPIPE (Broken pipe)
I can never tie this down to any event in my application's log. The websocket server application is still working while this happens but it is very slow to respond. The only solution is a restart of the Ratchet process, which is highly disruptive.
Edit: Under normal circumstances the process consumes < 5% CPU, even at busy times. This condition started happening around about a year ago in June 2022 (probably when I ran a composer update) - previous to that it never happened once.
Have you tried to revert this commit: https://github.com/reactphp/stream/commit/3eb342d87ca89e0c4c7c428505f36051c172f677
My Ratchet is rock solid and never failed (for more than a year) after the revert.
Thanks @acadjsr I'll give it a go
@clint0s Thanks for the additional input on this :+1:
Still working together with @clue on a solution for this, already spent multiple hours and we have yet to figure out a way to reproduce this issue.
Also note @clue's comment about reverting to an old version above:
Reverting to an old version doesn't seem sensible at this point as the previous version had other known issues that are in turn known to be fixed in the current version (see above https://github.com/ratchetphp/Ratchet/issues/939#issuecomment-1098798274).
Hi.
During the last month, I am having a problem with running a Ratchet server in a production environment, suddenly the process goes from normally using 0-1% of a CPU core to 98-100%, the websocket server still working but it takes longer to respond, this happens in a variety of ways, sometimes it takes a few hours to happen and others it takes days / weeks.
When it enters this state it does not exit again until I kill the process and supervisor starts a new one.
I am also checking the logic of my application so that it is not a problem on my part.
Apache version
PHP version
Ratchet version being used
Event loop
supervisor.conf
I have this code in the app, but there is no record in
chat_info.log
orchat_error.log
This is what appears if I run
strace
in the pidstrace
``` --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3517427, si_uid=0} --- epoll_wait(6, [{EPOLLOUT|EPOLLHUP, {u32=144, u64=3702261809296}}], 680, 10400) = 1 epoll_ctl(6, EPOLL_CTL_MOD, 144, {EPOLLOUT, {u32=144, u64=3702261809296}}) = 0 write(144, "\27\3\3\0\25\261\3\360_\3138\365>\260H\207\321X\226\361\206\240v\326:\351", 26) = -1 EPIPE (Broken pipe) --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3517427, si_uid=0} --- epoll_wait(6, [{EPOLLOUT|EPOLLHUP, {u32=144, u64=3702261809296}}], 680, 10400) = 1 epoll_ctl(6, EPOLL_CTL_MOD, 144, {EPOLLOUT, {u32=144, u64=3702261809296}}) = 0 write(144, "\27\3\3\0\25\261\3\360_\3138\365>\260H\207\321X\226\361\206\240v\326:\351", 26) = -1 EPIPE (Broken pipe) --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3517427, si_uid=0} --- epoll_wait(6, [{EPOLLOUT|EPOLLHUP, {u32=144, u64=3702261809296}}], 680, 10400) = 1 epoll_ctl(6, EPOLL_CTL_MOD, 144, {EPOLLOUT, {u32=144, u64=3702261809296}}) = 0 write(144, "\27\3\3\0\25\261\3\360_\3138\365>\260H\207\321X\226\361\206\240v\326:\351", 26) = -1 EPIPE (Broken pipe) --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3517427, si_uid=0} --- epoll_wait(6, [{EPOLLOUT|EPOLLHUP, {u32=144, u64=3702261809296}}], 680, 10400) = 1 epoll_ctl(6, EPOLL_CTL_MOD, 144, {EPOLLOUT, {u32=144, u64=3702261809296}}) = 0 write(144, "\27\3\3\0\25\261\3\360_\3138\365>\260H\207\321X\226\361\206\240v\326:\351", 26) = -1 EPIPE (Broken pipe) --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3517427, si_uid=0} --- epoll_wait(6, [{EPOLLOUT|EPOLLHUP, {u32=144, u64=3702261809296}}], 680, 10400) = 1 epoll_ctl(6, EPOLL_CTL_MOD, 144, {EPOLLOUT, {u32=144, u64=3702261809296}}) = 0 write(144, "\27\3\3\0\25\261\3\360_\3138\365>\260H\207\321X\226\361\206\240v\326:\351", 26) = -1 EPIPE (Broken pipe) --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3517427, si_uid=0} --- epoll_wait(6, [{EPOLLOUT|EPOLLHUP, {u32=144, u64=3702261809296}}], 680, 10399) = 1 epoll_ctl(6, EPOLL_CTL_MOD, 144, {EPOLLOUT, {u32=144, u64=3702261809296}}) = 0 write(144, "\27\3\3\0\25\261\3\360_\3138\365>\260H\207\321X\226\361\206\240v\326:\351", 26) = -1 EPIPE (Broken pipe) --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3517427, si_uid=0} --- epoll_wait(6, [{EPOLLOUT|EPOLLHUP, {u32=144, u64=3702261809296}}], 680, 10399) = 1 epoll_ctl(6, EPOLL_CTL_MOD, 144, {EPOLLOUT, {u32=144, u64=3702261809296}}) = 0 write(144, "\27\3\3\0\25\261\3\360_\3138\365>\260H\207\321X\226\361\206\240v\326:\351", 26) = -1 EPIPE (Broken pipe) --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3517427, si_uid=0} --- epoll_wait(6, [{EPOLLOUT|EPOLLHUP, {u32=144, u64=3702261809296}}], 680, 10399) = 1 epoll_ctl(6, EPOLL_CTL_MOD, 144, {EPOLLOUT, {u32=144, u64=3702261809296}}) = 0 write(144, "\27\3\3\0\25\261\3\360_\3138\365>\260H\207\321X\226\361\206\240v\326:\351", 26) = -1 EPIPE (Broken pipe) --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3517427, si_uid=0} --- epoll_wait(6, [{EPOLLOUT|EPOLLHUP, {u32=144, u64=3702261809296}}], 680, 10399) = 1 epoll_ctl(6, EPOLL_CTL_MOD, 144, {EPOLLOUT, {u32=144, u64=3702261809296}}) = 0 write(144, "\27\3\3\0\25\261\3\360_\3138\365>\260H\207\321X\226\361\206\240v\326:\351", 26) = -1 EPIPE (Broken pipe) --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3517427, si_uid=0} --- epoll_wait(6, [{EPOLLOUT|EPOLLHUP, {u32=144, u64=3702261809296}}], 680, 10399) = 1 epoll_ctl(6, EPOLL_CTL_MOD, 144, {EPOLLOUT, {u32=144, u64=3702261809296}}) = 0 write(144, "\27\3\3\0\25\261\3\360_\3138\365>\260H\207\321X\226\361\206\240v\326:\351", 26) = -1 EPIPE (Broken pipe) --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3517427, si_uid=0} --- epoll_wait(6, [{EPOLLOUT|EPOLLHUP, {u32=144, u64=3702261809296}}], 680, 10398) = 1 epoll_ctl(6, EPOLL_CTL_MOD, 144, {EPOLLOUT, {u32=144, u64=3702261809296}}) = 0 write(144, "\27\3\3\0\25\261\3\360_\3138\365>\260H\207\321X\226\361\206\240v\326:\351", 26) = -1 EPIPE (Broken pipe) --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3517427, si_uid=0} --- epoll_wait(6, [{EPOLLOUT|EPOLLHUP, {u32=144, u64=3702261809296}}], 680, 10398) = 1 epoll_ctl(6, EPOLL_CTL_MOD, 144, {EPOLLOUT, {u32=144, u64=3702261809296}}) = 0 write(144, "\27\3\3\0\25\261\3\360_\3138\365>\260H\207\321X\226\361\206\240v\326:\351", 26) = -1 EPIPE (Broken pipe) --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3517427, si_uid=0} --- epoll_wait(6, [{EPOLLOUT|EPOLLHUP, {u32=144, u64=3702261809296}}], 680, 10398) = 1 epoll_ctl(6, EPOLL_CTL_MOD, 144, {EPOLLOUT, {u32=144, u64=3702261809296}}) = 0 write(144, "\27\3\3\0\25\261\3\360_\3138\365>\260H\207\321X\226\361\206\240v\326:\351", 26) = -1 EPIPE (Broken pipe) --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3517427, si_uid=0} --- epoll_wait(6, [{EPOLLOUT|EPOLLHUP, {u32=144, u64=3702261809296}}], 680, 10398) = 1 epoll_ctl(6, EPOLL_CTL_MOD, 144, {EPOLLOUT, {u32=144, u64=3702261809296}}) = 0 write(144, "\27\3\3\0\25\261\3\360_\3138\365>\260H\207\321X\226\361\206\240v\326:\351", 26) = -1 EPIPE (Broken pipe) --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3517427, si_uid=0} --- epoll_wait(6, [{EPOLLOUT|EPOLLHUP, {u32=144, u64=3702261809296}}], 680, 10398) = 1 epoll_ctl(6, EPOLL_CTL_MOD, 144, {EPOLLOUT, {u32=144, u64=3702261809296}}) = 0 write(144, "\27\3\3\0\25\261\3\360_\3138\365>\260H\207\321X\226\361\206\240v\326:\351", 26) = -1 EPIPE (Broken pipe) --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3517427, si_uid=0} --- epoll_wait(6, [{EPOLLOUT|EPOLLHUP, {u32=144, u64=3702261809296}}], 680, 10397) = 1 epoll_ctl(6, EPOLL_CTL_MOD, 144, {EPOLLOUT, {u32=144, u64=3702261809296}}) = 0 write(144, "\27\3\3\0\25\261\3\360_\3138\365>\260H\207\321X\226\361\206\240v\326:\351", 26) = -1 EPIPE (Broken pipe) --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3517427, si_uid=0} --- epoll_wait(6, [{EPOLLOUT|EPOLLHUP, {u32=144, u64=3702261809296}}], 680, 10397) = 1 epoll_ctl(6, EPOLL_CTL_MOD, 144, {EPOLLOUT, {u32=144, u64=3702261809296}}) = 0 write(144, "\27\3\3\0\25\261\3\360_\3138\365>\260H\207\321X\226\361\206\240v\326:\351", 26) = -1 EPIPE (Broken pipe) --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3517427, si_uid=0} --- epoll_wait(6, [{EPOLLOUT|EPOLLHUP, {u32=144, u64=3702261809296}}], 680, 10397) = 1 epoll_ctl(6, EPOLL_CTL_MOD, 144, {EPOLLOUT, {u32=144, u64=3702261809296}}) = 0 write(144, "\27\3\3\0\25\261\3\360_\3138\365>\260H\207\321X\226\361\206\240v\326:\351", 26) = -1 EPIPE (Broken pipe) --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3517427, si_uid=0} --- epoll_wait(6, [{EPOLLOUT|EPOLLHUP, {u32=144, u64=3702261809296}}], 680, 10397) = 1 epoll_ctl(6, EPOLL_CTL_MOD, 144, {EPOLLOUT, {u32=144, u64=3702261809296}}) = 0 write(144, "\27\3\3\0\25\261\3\360_\3138\365>\260H\207\321X\226\361\206\240v\326:\351", 26) = -1 EPIPE (Broken pipe) ```
Thank you for your replies.