steamroller-airmash / airmash-server

Server implementation for the game AIRMASH
Apache License 2.0
29 stars 8 forks source link

On demand server trace exporting tool #112

Open steamroller-airmash opened 5 years ago

steamroller-airmash commented 5 years ago

It would be nice if the server was able to export a trace of all activity that happened over the last few minutes (or even longer if that is feasible). Right now there's very little to go on with bugs such as #108 or #109.

I'm not entirely sure about the design of such an exporter but it should probably capture

In addition we want to consider some other things

Thoughts?

ghost commented 5 years ago

I guess the cheapest is some kind of circular buffer, and maybe have the HTTP server dump it on demand. Incoming/outgoing messages are already serialized, so that part is basically limited by memory bandwidth (i.e. not at all! under DoS, networking would definitely die first), but that's not true for internal events. Could we dump the in-memory representation of events as-is? I have no idea if e.g. they contain pointers or not

Another thought was around having the logging/tracing stuff all dumped to a pipe that could be connected to another program. Again thinking about the anti-abuse use case, imagine being able to cobble a quick Python script together that could handle a particular attack without recompiling or restarting the server, but this is maybe a little more involved than what is needed in the short term

steamroller-airmash commented 5 years ago

I was more concerned about disk space being all used up assuming we are dumping bug reports to disk. I wouldn't be concerned about having someone use up all the bandwidth of the server due to making bug reports (hopefully).

The way I'm currently picturing this working is by having the server store a copy of every event (packet, internal engine event, log messages, etc.) in a queue where we drop events after say 5 minutes. This would be consistent with how much history we get leading up to the bug report. When a trace is exported we just serialize the entire queue + server state and drop that in a file.

(edit: rewrote this section) As another idea compared to dumping the in-memory representation, serde has some fairly fast serialization formats that might be easier to take advantage of (e.g. bincode). Depending on how we design this we could also serialize to JSON but just run the serialization process on a second thread so it doesn't block the main game loop.

As for sending everything over a pipe: the only problem I have here is that it forces the server to constantly be serializing everything. Ideally we won't be creating server traces constantly so this seems inefficient to me.

One thought is that we could solve a lot of the worries about abuse by doing the following when doing a trace

  1. Dump the requested trace
  2. Clear the trace queue
  3. Start the new trace queue with "created trace xxxxxxx"

If a second trace is requested right after it only creates a small tracedump. It shouldn't be hard to write some scripts to put back together fragments like these after the fact as well.

usopp-airmash commented 5 years ago

I'm a fairly beginner on this matter, but here is my naive thought.

I observe that Q's bots have some interactive mechanism that allows a user to become a leader ('Type #yes in the next 30 seconds to become a leader'). If there is no input then nothing is changed.

So could we implement a similar failsafe that allows a trusted user (with password) to request printing trace, reports on the server? like every 2 minutes the server sends out a message ('Type #yes +password in the next 30 seconds if there is a serious bug that needs to be reported')

The server needs only to print out for a 30 sec duration.

steamroller-airmash commented 5 years ago

I'm more concerned about a bug report feature being used maliciously to take down/lag a server. Beyond that, I wouldn't be worried about multiple players all reporting a bug at the same time if they saw it.

On that note, I think a fixed password that we can give out to a number of people would be a good idea. It would resolve my concerns about automated abuse of the bug report mechanism.

Next, I think having the server constantly sending out messages to players would get annoying pretty quickly. (I know it would for me!) I think the best course of action there would be to just dump the last n minutes of trace data where we decide n based on some tradeoff of what we need for debugging, memory use, etc.