krzyzanowskim / OnlineSwiftPlayground

Online Swift Playground
http://online.SwiftPlayground.run
Other
244 stars 31 forks source link

Server hangs on macOS after a while #42

Open krzyzanowskim opened 2 years ago

krzyzanowskim commented 2 years ago

I've noticed the server stop responding after some time (usually ~2 days) and can't properly restart it. Not sure what the NIO is doing but I have to reboot (or logout) the user to unbind. It's weird. Does it ring any bell @josefdolezal?

Launched as

.build/swift-5.7-RELEASE/x86_64-apple-macosx/release/PlaygroundServer serve --hostname 0.0.0.0

It may, or may not be related to https://github.com/vapor/vapor/issues/2502

josefdolezal commented 2 years ago

Hey @krzyzanowskim! I finally found some time to investigate today, but I am unable to reproduce such behavior.

Based on your description, I was suspecting Vapor's handling of disconnected WebSockets might not work as expected and could lead to server not being able to open new connections. Anyway, I tested couple scenarios of disconnecting from sockets on FE side and everything seems to be working fine on the BE.

The other problem I was investigating is with Vapor not being able to successfully shutdown, if there is an open socket connection (the server dies with [ ERROR ] Could not stop HTTP server: Abort.500: Server stop took too long. error). Although even in this case, the app successfully unbinds itself from the port it was running on.

I tried fixing the latter in this branch, but it seems like the issue is somewhere deeper in Vapor itself, and cannot be fixed on our side.

I'll investigate further but I am entering uncharted waters here.

krzyzanowskim commented 2 years ago

yea, it's hard to debug. It just happens. Even when I kill the process it keep something running

Flow@swiftplayground ~ %  killall -9 PlaygroundServer
No matching processes belonging to you were found
Flow@swiftplayground ~ % ./run.sh                    
~/Devel/OnlineSwiftPlayground ~
Building for production...
Build complete! (0.32s)
[ NOTICE ] Server starting on https://0.0.0.0:443
[ WARNING ] bind(descriptor:ptr:bytes:): Address already in use (errno: 48)
Flow@swiftplayground ~ % lsof -i TCP:443  
COMMAND  PID USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
main    2642 Flow   16u  IPv4 0x8e0308ab2ca9453b      0t0  TCP *:https (LISTEN)
Flow@swiftplayground ~ % ps -f   
  UID   PID  PPID   C STIME   TTY           TIME CMD
  501  2642     1   0  1:57AM ttys000    0:03.53 /var/folders/dh/nm0n58p10lj30_136g72x86w0000gn/T/3B8817DC-15EE-42D8-8A74-026B8EE374A8-37800-00014DC1751017EE.Yk711M/main
  501  4128     1   0  2:25AM ttys000    0:03.18 /var/folders/dh/nm0n58p10lj30_136g72x86w0000gn/T/49CFACA0-E80D-4FD8-8FA0-17289D0EBA97-37800-00014F4140693206.RUN5sR/main

any idea what is the main process here? the main is an user executable 🤔

josefdolezal commented 2 years ago

Do you have access to the user's code?

It sounds like the playground process was not gracefully killed (during server shutdown) resulting in an orphan process (therefore the PPID is 1 and the port is still bind). The question is why the user's process did not exit, considering the code execution should be fast.

krzyzanowskim commented 2 years ago

It sounds like the playground process was not gracefully killed (during server shutdown)

killall -9 PlaygroundServer because it won't shutdown otherwise

Do you have access to the user's code?

yes

import Foundation

func countdown(_ N: Int) {
    var i = N

    while N > 0 {
        print(i)

        sleep(1)

        i -= 1
    }

    print("GO!")
}
countdown(3)
//let arrNum:[Int] = first(5)
//print(arrNum)
krzyzanowskim commented 2 years ago

it keep freezing, now even without running child process. I suspect websocket

Screen Shot 2022-11-01 at 22 47 36@2x
krzyzanowskim commented 1 year ago
[ WARNING ] LeafError.500: No template found for footer-container [request-id: 21C7E592-965D-4B6A-85BF-55FE2633EA69]
[ INFO ] GET / [request-id: 48B10DE8-CD4B-4141-BD66-4C1A4798C85A]
[ WARNING ] LeafError.500: No template found for meta-fragment [request-id: 48B10DE8-CD4B-4141-BD66-4C1A4798C85A]
[ INFO ] GET / [request-id: D4802EF8-856B-43DB-A93F-A93425D0B625]
[ WARNING ] LeafError.500: No template found for favicon-fragment [request-id: D4802EF8-856B-43DB-A93F-A93425D0B625]
[ INFO ] GET / [request-id: BA202083-D7AB-40AE-91CA-CFA095BFA251]
[ WARNING ] LeafError.500: No template found for footer-container [request-id: BA202083-D7AB-40AE-91CA-CFA095BFA251]
[ INFO ] GET / [request-id: AB5FC612-9B1F-4B85-9A85-610603322857]
[ WARNING ] LeafError.500: No template found for fonts-fragment [request-id: AB5FC612-9B1F-4B85-9A85-610603322857]
[ INFO ] GET / [request-id: B9BDAABE-C3EE-4ADC-9EB0-A43EBFC09D3D]
[ WARNING ] LeafError.500: No template found for fonts-fragment [request-id: B9BDAABE-C3EE-4ADC-9EB0-A43EBFC09D3D]
[ INFO ] GET / [request-id: C616C23D-2FF1-4637-AE0D-7CB6CA7FA860]
[ WARNING ] LeafError.500: No template found for fonts-fragment [request-id: C616C23D-2FF1-4637-AE0D-7CB6CA7FA860]
[ INFO ] GET / [request-id: 820EE583-E4FE-4973-BF8C-3737310697D6]
[ WARNING ] LeafError.500: No template found for meta-fragment [request-id: 820EE583-E4FE-4973-BF8C-3737310697D6]
[ INFO ] GET / [request-id: 27CC6D8A-C104-40FA-A532-A12D61429220]
[ WARNING ] LeafError.500: No template found for favicon-fragment [request-id: 27CC6D8A-C104-40FA-A532-A12D61429220]
[ INFO ] GET / [request-id: 2DEC6011-12C8-4ADA-9A61-017EB0EB2B27]
[ WARNING ] LeafError.500: No template found for bootstrap-fragment [request-id: 2DEC6011-12C8-4ADA-9A61-017EB0EB2B27]
[ INFO ] GET / [request-id: DF7C8B85-F648-48E8-9357-1F4EDDB50471]
[ WARNING ] LeafError.500: No template found for fontawesome-fragment [request-id: DF7C8B85-F648-48E8-9357-1F4EDDB50471]
[ INFO ] GET / [request-id: 3ABF5405-06C8-4244-89BE-3296B4AB7034]
[ WARNING ] LeafError.500: No template found for footer-container [request-id: 3ABF5405-06C8-4244-89BE-3296B4AB7034]
josefdolezal commented 1 year ago

Was this always there after the migration? I just tried running master branch on my machine and everything works fine. The production website also seems to work (content of footer-container template is in the html).

krzyzanowskim commented 1 year ago

I reboot production every two days ;-) It works, works, and then it stops after several hours 🤷