pterodactyl / panel

Pterodactyl® is a free, open-source game server management panel built with PHP, React, and Go. Designed with security in mind, Pterodactyl runs all game servers in isolated Docker containers while exposing a beautiful and intuitive UI to end users.
https://pterodactyl.io
Other
6.74k stars 1.72k forks source link

Can't turn on server without restarting wings: another power action is currently being processed #3903

Closed AV3RG closed 2 years ago

AV3RG commented 2 years ago

STOP! Read this before responding! :warning:

Please do not post any replies that do not contain any actual logs or don't contribute anything to this issue. If you have a temporary fix like cronjob keep that to yourself as it will do nothing to fix the underlying issue

If you're going to respond to this issue, please do not respond to say "+1" or "I also have this issue." If you are responding, please include all relevant information, including the output of wings diagnostics and your full wings log if possible. Make sure your system is running with --debug or debug: true (in the config.yml) before submitting your logs.

Please clearly identify timestamps and servers that are affected, we have limited time to work on this project, trying to identify or hunt down this bug is not feasible without clear and detailed reported.

Update: Potential Fix

See this comment for details.

Is there an existing issue for this?

Current Behavior

Sometimes while restarting or stopping the server, wings show that the server is stuck in offline mode like this

image

It still shows the time next to the online status

When it is stuck like that you cant start/restart the server. Any attempts to do so show this on the console constantly (not due to spamming power buttons)

image

However, if you restart wings the server returns to a normal offline state and you can start it normally

Expected Behavior

Normally the time does not show up (when it's not stuck) and you can start the server normally

image

Steps to Reproduce

It does not happen every time. There are no particular steps to reproduce, happens when restarting or stopping the server. I have seen many facing the same issue in the pterodactyl discord

Panel Version

1.7.0

Wings Version

1.5.5

Error Logs

No response

DaneEveritt commented 2 years ago

This has been a long standing issue that has existed for nearly the entire existence of Wings, it just became more prevalent due to other changes that improved the consistency at which it happened.

Unless you're providing feedback about the development builds @matthewpi provided, or providing specific logs and details, please refrain from commenting on this issue. It causes excessive notification noise, and hides important details and conversations from others.

I have a new build for anyone experiencing this problem. It changes the logic when starting a server from create, start, attach to create, attach, start (this is how the docker run command also works). This build also has additional debug logs around starting and attaching to the container which should help us diagnose any further problems.

You can download the build from https://github.com/pterodactyl/wings/actions/runs/1747467060 NOTE: You will need to be logged in to download the build artifact.

Synkstar commented 2 years ago

I recorded a video of me reproducing the problem takes about 5 or so minutes to cause it to heppen. https://streamable.com/mp03by. Im just switching my ips using vpn connections I have basically and refreshing. Edit: seems to also happen if I change ips once and wait 5-20 minutes just tested

Synkstar commented 2 years ago

The issue doesn't seem to happen if you run wings behind an nginx reverse proxy on the same server/node connecting via localhost. Nginx correctly handles the websocket connections and handles the timing out so the connections get closed correctly. https://www.nginx.com/blog/websocket-nginx/ You also need to disable https on wings, set it to listen on localhost and have nginx bind to the public ip and do https. I did the same thing that I did in the video and it didn't happen and its been more than 30 minutes ill know for sure tomorrow if no one is having that issue. I guess by the looks of it this can be used as a temporary patch

Edit: This seems to have fixed it I haven't had this issue in 2 days used to happen daily. Id say this is the best fix here because it doesn't break at all and doesn't require you to use a cronjob

Oliverdotdotdot commented 2 years ago

I am having the same error on my hosting company, I have found a temporary fix until this gets resolved.

If running any Linux OS's you should be able to use cronjob's to restart wings every 5 minutes or so, that will make the servers that are unable to startup with the timer able to start up again.

Brandin commented 2 years ago

This issue went away for me for quite a few days, during that time:

Yesterday, when a number of files were changed, both by the container and myself (2 remote files changed), the scheduled restart failed as it was unable to power the server back on, with the previous error log we're presented before in this thread. I'm unsure if my diagnosis of some files being changed manually by myself aids us in any way, but I wanted to mention that this behavior occurred only after this action, and worked fine for days prior.

DaneEveritt commented 2 years ago

Hi all — unless you're running the latest code from develop or the specific build that @matthewpi highlighted, we don't need any more reports. We're fully aware that it is not working properly, but unless you have consistent, reproducible steps for this bug, it isn't helpful for us if you keep replying to this thread.

If you encounter issues on that development build, we want to know, otherwise we believe this issue is resolved.

BurritoWrapped commented 2 years ago

Still having issues on the latest version that was suggested.

ERROR: [Jan 31 12:48:07.146] error processing websocket event "set state" error=failed to acquire exclusive lo> Stacktrace: locker: cannot acquire lock, already locked failed to acquire exclusive lock for power actions github.com/pterodactyl/wings/server.(*Server).HandlePowerAction /home/runner/work/wings/wings/server/power.go:102 github.com/pterodactyl/wings/router/websocket.(*Handler).HandleInbound /home/runner/work/wings/wings/router/websocket/websocket.go:355 github.com/pterodactyl/wings/router.getServerWebsocket.func3 /home/runner/work/wings/wings/router/router_server_ws.go:85 runtime.goexit /opt/hostedtoolcache/go/1.17.6/x64/src/runtime/asm_amd64.s:1581

DaneEveritt commented 2 years ago

@EliteNover @ItsLachy I need the full Wings logs (not the ones from the diagnostics command, it truncates a lot of context), and please indicate at least one server UUID that is facing the issue. I need to be able to see the historical API calls made for the server to better pinpoint consistent reproduction steps.

DaneEveritt commented 2 years ago

@JRH-1997 @ItsLachy @EliteNover @iLucasUS

New build with more debugging information based on some other logs I looked at. Doubtful that it fixes the issue yet, but should include better debug output to narrow down the specific action that is failing:

https://github.com/pterodactyl/wings/actions/runs/1775695398

liampearson96 commented 2 years ago

https://pastebin.com/6zskSUXC

Maybe it will help? Only started happening to me after updating to 1.6.0 Only happens on a single server. with a working schedule. Only happens when the restart schedule is requested.

DaneEveritt commented 2 years ago

@liampearson96 your logs appear to have been pasted into a bash prompt? Can you please just pull the file directly, should be somewhere in /var/log/wings I believe?

Also, the logs you posted are just Wings failing to start because it is already running.

trenutoo commented 2 years ago

/var/log/pterodactyl/ is the default log path. You can check wings config to see if you have changed it (e.g running wings in docker container)

liampearson96 commented 2 years ago

wings.log

Opps apologies @DaneEveritt this is the only file i have

iLucasUS commented 2 years ago

A reinstalação do servidor que apresenta o erro o corrige.

This is wrong, I did it on several clients and still the problem came back.

DaneEveritt commented 2 years ago

Posting my thoughts from a conversation in Discord:

I've narrowed down the likely culprit for the power action lock issues.

But beats me if I know what exactly causes it, just know the consistent lock point for the server logs I've seen. server/power.go#onBeforeStart() -> s.SyncWithEnvironment() (the second call, not the one in s.Sync(). Somewhere in that function at or after the s.Environment.InSituUpdate() call it locks up and never completes. The updating server configuration files... debug output never shows up. So there are still a few potential spots, but it is definitely narrowed down

And given the changes we've been making, curious if there is some race/unexpected lockup in the PublishConsoleOutputFromDaemon call which would flow through those channels, which might explain the sudden increase in issues stemming from those updates.

Which, looking at the code again, seems entirely possible to lock up since the logic in events/events.go#L114 doesn't account for a blocked channel, and will spin.

DaneEveritt commented 2 years ago

Please try giving the build from this run a go on your machines: https://github.com/pterodactyl/wings/actions/runs/1787260301

This should resolve the power lock issue assuming I debugged any of this correctly.

SMGoro commented 2 years ago

wings version: 1.6.0 https://ptero.co/culazaceru

JRH-1997 commented 2 years ago
DEBUG: [Feb  4 17:02:10.259] acquiring power action lock for instance action=restart lock_id=30f960bb-85dc-11ec-92b1-fa163e54e6d6 server=040ae464-92c1-4ca6-91d3-5bd599961619 wait_seconds=0
 INFO: [Feb  4 17:02:10.260] acquired exclusive lock on power actions, processing event... action=restart lock_id=30f960bb-85dc-11ec-92b1-fa163e54e6d6 server=040ae464-92c1-4ca6-91d3-5bd599961619
DEBUG: [Feb  4 17:02:10.264] saw server status change event server=040ae464-92c1-4ca6-91d3-5bd599961619 status=stopping
DEBUG: [Feb  4 17:02:11.891] GET /api/system client_ip={redacted} latency=84.185µs request_id=9f1a8a60-3937-4279-b2f3-eadef1c3f8f2 status=200
DEBUG: [Feb  4 17:02:15.426] acquiring power action lock for instance action=stop lock_id=340dd9ab-85dc-11ec-92b1-fa163e54e6d6 server=040ae464-92c1-4ca6-91d3-5bd599961619 wait_seconds=0
ERROR: [Feb  4 17:02:15.429] error processing websocket event "set state" error=failed to acquire exclusive lock for power actions: locker: cannot acquire lock, already locked error_identifier=1108bf09-a170-440b-902e-dce501e0df24 event=set state server=040ae464-92c1-4ca6-91d3-5bd599961619

Stacktrace:
locker: cannot acquire lock, already locked
failed to acquire exclusive lock for power actions
github.com/pterodactyl/wings/server.(*Server).HandlePowerAction
    /home/runner/work/wings/wings/server/power.go:102
github.com/pterodactyl/wings/router/websocket.(*Handler).HandleInbound
    /home/runner/work/wings/wings/router/websocket/websocket.go:355
github.com/pterodactyl/wings/router.getServerWebsocket.func3
    /home/runner/work/wings/wings/router/router_server_ws.go:85
runtime.goexit
    /opt/hostedtoolcache/go/1.17.6/x64/src/runtime/asm_amd64.s:1581

I've seen this multiple times in our logs now. Restart action that tries to stop the server, but can't stop because of the running process (discord bot). After that the stop command. But that button should be hidden on the panel when you used the restart command and only show the kill button.

We don't use the new update but don't know if you fixed this already or if this could help.

DaneEveritt commented 2 years ago

Please use the build I linked immediately above. I already know it is broken on existing releases, so additional reports don't help too much. Knowing if it is fixed in the unreleased build is more helpful. :)

TsjipTsjip commented 2 years ago

I have switched to the dev build generated from commit 72476c61ec0ede8adff243522b562b8f02935d1e (as instructed in the post immediately above) and can report that the issue appears fixed currently, immediately after switching. I'll edit this post should that changed.

itsnotrin commented 2 years ago

I downloaded and installed Dane's new build (https://github.com/pterodactyl/wings/actions/runs/1787260301) and it seems the issue may have stopped. Great work!

camw0 commented 2 years ago

72476c has fixed all power lock issues for me - looks good!

DaneEveritt commented 2 years ago

closed in https://github.com/pterodactyl/wings/commit/72476c61ec0ede8adff243522b562b8f02935d1e

RTK23-dev commented 2 years ago

The power lock issue still exists. I got an error "This server is in a failed install state and cannot be recovered. Please delete and re-create the server." And the wings logs states there is power lock issue and in support server the bot said to update wings, however I am on latest wings version still I updated and ofcourse I restarted wings before and after but the issue still remains. Heres the wings logs - https://ptero.co/wililacime

Brandin commented 2 years ago

I’m fairly confident the issue you’re describing is not the same as this post.

nackerr commented 2 years ago

The power lock issue still exists. I got an error "This server is in a failed install state and cannot be recovered. Please delete and re-create the server." And the wings logs states there is power lock issue and in support server the bot said to update wings, however I am on latest wings version still I updated and ofcourse I restarted wings before and after but the issue still remains. Heres the wings logs - https://ptero.co/wililacime

We’ve also been getting this. It’s an issue, but not this issue. It’s not related to the power lock.