Closed AV3RG closed 2 years ago
I have the some problem!
Seeing the same issue.
Occurred after upgrade to latest wings. Reverting fixes the issue.
I can confirm the same. Getting the same error as Mark above me, compiled on Go 1.17.5. Both the panel and wings otherwise are up to date as of writing this comment.
I’ve the same problem after updating to .55 version.
same problem.
Me can confirm too. Restart of wings help resolve it
Wings: 1.5.5 Panel: 1.6.6 (1.7.0 manually compiled)
I can confirm this problem too. Restarting wings fixes it. Started hapenning after updating to Wings 1.5.5. Panel: 1.7.0 Wings: 1.5.5
Commenting "me too" is of no use for troubleshooting and triggers notifications for nothing. Me and Matthew spent hours trying to reproduce and identify it.
Please provide data to reproduce and identify the issue instead. What did you do for it to happen, possible errors in wings logs, etc
Commenting "me too" is of no use for troubleshooting and triggers notifications for nothing. Me and Matthew spent hours trying to reproduce and identify it.
Provide data to reproduce and identify the issue instead.
I appreciate the hard work you guys have done so far. Maybe the not confirmed label makes people (like me) want to help by showing it's a thing, without the intention of triggering notification for nothing.
For additional info which I haven't seen in any other comment, it has happened to me with a scheduled power action, so this might not have a browser influence. My OS is Ubuntu Server 20.04.3.
From my experience today, I pressed stop then kill within a second from one another and it managed to cause this.
EDIT:
Another bit of information that might be useful:
when trying to stop/start wings via systemctl stop wings, It wasn't actually stopping the process/unbinding the port.
The way around was using lsof and manually using kill
to kill the process.
This issue began occurring on my installation after the upgrade to 1.5.4 and subsequently to 1.5.5. The server is on Ubuntu 20.04.1 and fully up to date. Restarting wings, as mentioned, resolves the issue for me, but sometimes I'll need to kill it or reboot the server. Can you let me know what kind of logs would be useful so I can help supply them? I'd be happy to!
restarting wings temporarily solves the problem, but after a while it says that it already has a power action in progress. No error log is being generated. shouldn't the solution be to think of a more efficient way to capture if the service is actually running or not?
Same here, issue happened with the 1.5.4-1.5.5 upgrade, I am also on Ubuntu 20.04.1 but I am not using any scheduler features. The issue happens after I leave the servers running for a day or so, the servers have activity with large amount of players playing (if it might matter). So after a day when I want to shutdown, everything seems normal, but when I press stop, wings stops updating the consoles and all other server indicators, and I need to keep reloading the page to see what happens. When they are all stopped and I try to start them again, the error message appears and I need to restart wings.
Please try downloading and running the following build: https://github.com/pterodactyl/wings/actions/runs/1736581488
Click on the correct artifact for your processor in the "Artifacts" list, and then replace your current wings binary with this one, as if it was a normal upgrade. I changed how the power actions handler works, I can't promise it actually fixes anything but it should start providing some better debugging information for me.
Make sure you have debug: true
in your Wings config, or run with --debug
.
@LucidAPs @iLucasUS @Brandin @NoSharp @markd69 @RohanGoyalDev
I just installed this build, soon I'll tell you if it solved.
not work.
That is not a helpful response unfortunately. Can you please provide the logs, ideally as much of them as you can, you may need to pull this from something other than the diagnostics command.
Please also clearly indicate the server UUID that was affected.
Wings does not show any errors.
It wouldn't, which is why I asked for the logs because I added debugging statements for myself. :)
here!!!
This error is in version 1.5.6
Wings 1.5.6
Unfortunally, it still happens on 1.5.6.
Can you please provide the logs, ideally as many of them as you can, you may need to pull this from something other than the diagnostics command.
@iLucasUS @de-Rick
Please just attach the raw log files, the diagnostics command trims the log output significantly. Additionally, please tell me what specific servers are having this issue (their specific UUID) so that when I filter down the logs I know what server to be looking at.
Thanks.
I can't start server EDIT: wings logs https://ptero.co/agitepycul
Here you are: http://badger.alphaoperations.eu/wings20220124.log
This log contains ALL logging since the setup of wings (new setup)
The Server is running wings v1.5.5 up until Jan 23 20:03. Then it got upgraded to wings vdev-(https://github.com/pterodactyl/wings/actions/runs/1736581488 (amd64)) Debug flag has been enabled shortly before that.
Debian GNU/Linux 11 Docker version 20.10.12, build e91ed57 Panel 1.6.6
The uuid in question is 68cdca9f-44ee-40ad-8516-79fec70e4168
What else information do you need?
I can't start server EDIT: wings logs https://ptero.co/agitepycul
I'm getting the same error as this now seems to happen when using a schedule but only on one server for some reason. Idk its strange. Was "another power action is being processed" before but the update changed it to this
used to have an error like this: then when I upgraded the wings and panel again it changed to this: havent enabled debug sorry, will attempt to do so now and wait till it reproduces
I think the devs might be missing the info I wrote above, that this occurs only when you leave the servers running for more then 12+ hours and system resources get used.
I think the devs might be missing the info I wrote above, that this occurs only when you leave the servers running for more then 12+ hours and system resources get used.
from what I've experienced that's not the case, the servers that have encountered this have been restarted and a few times and then have the same occurrence happen again. I fixed the issue the first time just by restarting the wings service but then I restarted one of the servers on the panel and it happened again. only somewhat got fixed when I updated both again the panel and wings
Heres a video I made reproducing it: https://user-images.githubusercontent.com/45587852/150942400-4e15a222-cba7-4b2f-bad8-44d23e20878f.mp4
Heres a video I made reproducing it:
That's not reproducing the issue reported here, which is the lock not being released.
What you are doing is spamming power actions thus you receive the power action error. Your server still starts and stops.
Im pretty sure this problem is caused when the panel freezes due to a websocket connection write error then when someone tries to restart the server or well the schedule it gives failed to acquire exclusive lock and completely locks it up. The freezing happens when someone switches ips while using the panel.( like say when you connect to a vpn) Here are the two notable errors
Stacktrace:
write tcp {redacted}:8443->{redacted}:61236: write: connection timed out
github.com/pterodactyl/wings/router/websocket.(*Handler).listenForServerEvents.func1
/home/runner/work/wings/wings/router/websocket/listeners.go:99
github.com/pterodactyl/wings/router/websocket.(*Handler).listenForServerEvents
/home/runner/work/wings/wings/router/websocket/listeners.go:118
github.com/pterodactyl/wings/router/websocket.(*Handler).registerListenerEvents.func1
/home/runner/work/wings/wings/router/websocket/listeners.go:29
runtime.goexit
/opt/hostedtoolcache/go/1.17.6/x64/src/runtime/asm_amd64.s:1581
ERROR: [Jan 25 06:32:32.127] error processing websocket event "set state" error=failed to acquire exclusive lock for power actions: cannot acquire a lock on the power state: already locked error_identifier=68d63a19-8ce2-40b3-9905-841ec3ed895e event=set state server=581857c2-17b5-4229-9fe0-965bf1279c40
Stacktrace:
cannot acquire a lock on the power state: already locked
github.com/pterodactyl/wings/server.(*powerLocker).Acquire
/home/runner/work/wings/wings/server/power.go:77
github.com/pterodactyl/wings/server.(*Server).HandlePowerAction
/home/runner/work/wings/wings/server/power.go:177
github.com/pterodactyl/wings/router/websocket.(*Handler).HandleInbound
/home/runner/work/wings/wings/router/websocket/websocket.go:355
github.com/pterodactyl/wings/router.getServerWebsocket.func3
/home/runner/work/wings/wings/router/router_server_ws.go:85
runtime.goexit
/opt/hostedtoolcache/go/1.17.6/x64/src/runtime/asm_amd64.s:1581
This is most likely what is causing this issue because peoples cpu usage is frozen on the panel as well.
Interesting, theoretically that shouldn't make a difference because the lock should be tied to the Docker container's state (or the command being executed in Docker), and not the user's connection. Allegedly I wrote it so that if the context is canceled (e.g. the request/connection dies) the rest of the functionality should die as well, thus releasing the lock.
But thats a decent start and I can fiddle around with it more. @Synkstar are you able to reliably reproduce it in that case, just to confirm it wasn't luck of the draw? Also, @Synkstar can you include your logs so I can see the full event sequence?
I am not using the panel much. Most of my cases of "unable to start" were in the morning or during the night, where I was not using the panel. Webserver logs show that my application made two calls to the API. One to stop the server, and then 15 seconds later to start. I can not proof that the application was shut down at the point when it started again.
Currently, I am running 1.5.3 at night (when I am not here, no lockups there) and 1.5.6 during the day when I can see the monitoring. I was running 1.5.3 since October on the same hardware without problems and minimum maintenance. The game I'm running in the container is the same, the application I am using to start/stop via API is the same (more or less).
Interesting, theoretically that shouldn't make a difference because the lock should be tied to the Docker container's state (or the command being executed in Docker), and not the user's connection. Allegedly I wrote it so that if the context is canceled (e.g. the request/connection dies) the rest of the functionality should die as well, thus releasing the lock.
But thats a decent start and I can fiddle around with it more. @Synkstar are you able to reliably reproduce it in that case, just to confirm it wasn't luck of the draw? Also, @Synkstar can you include your logs so I can see the full event sequence?
Yeah I'm able to reproduce it. The only useful things in those logs is the errors. But if I turn on debugging I get https://gist.github.com/Synkstar/fa505208555768b3c607eae6fce5a9df. To reproduce basically just keep changing your ip and reloading the panel and eventually it will happen. I usually like waited a bit of leaving it frozen due to the ip being changed. Edit:
if err := h.unsafeSendJson(v); err != nil {
// Not entirely sure how this happens (likely just when there is a ton of console spam)
// but I don't care to fix it right now, so just mask the error and throw a warning into
// the logs for us to look into later.
if errors.Is(err, websocket.ErrCloseSent) {
if h.server != nil {
h.server.Log().WithField("subsystem", "websocket").
WithField("event", v.Event).
Warn("failed to send event to websocket: close already sent")
}
return nil
}
return err
}
On line 160 in websocket.go it says that im guessing this issue has something to do with that. Why is it called unsafe send json though ?!?
I'm receiving this issue quite often now, and it seems after scheduled restarts. I would attach the full logs, but debug was set to false.
It just happened on uuid b2bcc205-d787-4ee7-a80c-ee05bb19e29d.
I have a new build for anyone experiencing this problem. It changes the logic when starting a server from create, start, attach
to create, attach, start
(this is how the docker run
command also works). This build also has additional debug logs around starting and attaching to the container which should help us diagnose any further problems.
You can download the build from https://github.com/pterodactyl/wings/actions/runs/1747467060 NOTE: You will need to be logged in to download the build artifact.
Same error, here's the log for the help ! Simply restarting wings do the trick for me https://ptero.co/welymiwowi
I have a new build for anyone experiencing this problem. It changes the logic when starting a server from
create, start, attach
tocreate, attach, start
(this is how thedocker run
command also works). This build also has additional debug logs around starting and attaching to the container which should help us diagnose any further problems.You can download the build from https://github.com/pterodactyl/wings/actions/runs/1747467060 NOTE: You will need to be logged in to download the build artifact.
Seeing the following error on this build. https://ptero.co/enibylylyb.lua
Appears not to be resolved on this build.
Edit: sorry! Did not see the log messages on the message above
Happened to me on my private host and a public one, Wings ver was 1.5.5 for my private server and the host uses 1.5.6 (idk how) this has happened to many people that are using that public host and to me about 20times and even without pressing any buttons on the panel like doing /restart in game this thing happens I hope i provided some info
Does anyone know what version this started happening on? I think it has something to do with how they handle websocket connections because it freezes the server console on the new version it doesn't show the write error I mentioned but the console still freezes.
I am on 1.5.3 since October. Upgraded to 1.5.5 last week, due to a system restore after hardware defect.
Downgraded now to 1.5.3. Tomorrow I’m available all day, will run 1.5.4 for testing and will report back then.
1.5.3 no issues (related to this ticket) 1.5.4 unknown 1.5.5 lockup and wings restart required 1.5.6 lockup and wings restart required
Ladies and gentlemen.
I built a temporary fix that isn't the most optimal, but it does work. The trick is very simply.
The Magic word is: "CRONTAB" :-)
Type in SSH
crontab -e
Add the Line on each Wing server
0 * * * * systemctl restart wings >/dev/null 2>&
Done!
This command automatically executes the wings restart command every hour. So you don't have to do this by hand anymore. Of course you can also play something here at the times.
Ladies and gentlemen.
I built a temporary fix that isn't the most optimal, but it does work. The trick is very simply.
The Magic word is: "CRONTAB" :-)
Type in SSH
crontab -e
Add the Line on each Wing server0 * * * * systemctl restart wings >/dev/null 2>&
Done!This command automatically executes the wings restart command every hour. So you don't have to do this by hand anymore. Of course you can also play something here at the times.
This is maximum a workaround. And it does create a ton of other problems. What if the lockup happens 1 minute after the hour? I have to wait for 59 minutes?
Ladies and gentlemen. I built a temporary fix that isn't the most optimal, but it does work. The trick is very simply. The Magic word is: "CRONTAB" :-) Type in SSH
crontab -e
Add the Line on each Wing server0 * * * * systemctl restart wings >/dev/null 2>&
Done! This command automatically executes the wings restart command every hour. So you don't have to do this by hand anymore. Of course you can also play something here at the times.This is maximum a workaround. And it does create a ton of other problems. What if the lockup happens 1 minute after the hour? I have to wait for 59 minutes?
[...] Of course you can also play something here at the times. [...] [There are so many possibilities! Of course you can also have the logs read out every 5 minutes and tell the crontab at certain words to execute command XYZ... It's just about the idea and the temporary automation. Not about any scientific solutions. Please read carefully. I have over 500 clients using Pterodactyl daily. This enabled us to contain the problem and temporarily fix it.
After updating wings to the newest versopn scheduled restart doesn't work. Logs: https://ptero.co/wygamuzuha.yaml
I am on 1.5.3 since October. Upgraded to 1.5.5 last week, due to a system restore after hardware defect.
Downgraded now to 1.5.3. Tomorrow I’m available all day, will run 1.5.4 for testing and will report back then.
1.5.3 no issues (related to this ticket) 1.5.4 unknown 1.5.5 lockup and wings restart required 1.5.6 lockup and wings restart required
The issue is still on 1.5.3. The websocket handler it seems still locks up and prevents the server from starting
Then maybe we are chasing different issues here. I did not experience any issues since beginning of october 2021 till 23. january 2022. Running 1.5.3 on two machines.
Updated to 1.5.5 on Jan 23 00:45:47 and have this lockups happening frequenctly. (every 2-3 hours) Updated to the offered dev version vdev- on Jan 23 20:04:03 and continued to experience this problems. Downgraded to 1.5.3 Jan 24 23:04:21 and have not experienced any lockups yet. (I am not talking about temp lockups due to spamming power actions). Wings did run continiously till Jan 28 09:10:21 without restarts. Thats almost 3 1/2 days. Updated to 1.5.4 this morning at Jan 28 09:10:21 and so far had no lockups. (However I believe the timeframe is too short yet for a conclusion)
Time of writing Jan 28 13:15:01
I do not, or have almost never executed any function from the panel via a browser on a wonky connection. All commands are executed from localhost via api request to the panel api. Same goes for the console... I am reading file logs from the application running within the container, not via the webpanel.
STOP! Read this before responding! :warning:
Please do not post any replies that do not contain any actual logs or don't contribute anything to this issue. If you have a temporary fix like cronjob keep that to yourself as it will do nothing to fix the underlying issue
If you're going to respond to this issue, please do not respond to say "+1" or "I also have this issue." If you are responding, please include all relevant information, including the output of
wings diagnostics
and your full wings log if possible. Make sure your system is running with--debug
ordebug: true
(in theconfig.yml
) before submitting your logs.Please clearly identify timestamps and servers that are affected, we have limited time to work on this project, trying to identify or hunt down this bug is not feasible without clear and detailed reported.
Update: Potential Fix
See this comment for details.
Is there an existing issue for this?
Current Behavior
Sometimes while restarting or stopping the server, wings show that the server is stuck in offline mode like this
It still shows the time next to the online status
When it is stuck like that you cant start/restart the server. Any attempts to do so show this on the console constantly (not due to spamming power buttons)
However, if you restart wings the server returns to a normal offline state and you can start it normally
Expected Behavior
Normally the time does not show up (when it's not stuck) and you can start the server normally
Steps to Reproduce
It does not happen every time. There are no particular steps to reproduce, happens when restarting or stopping the server. I have seen many facing the same issue in the pterodactyl discord
Panel Version
1.7.0
Wings Version
1.5.5
Error Logs
No response