msmhq / msm

An init script for managing Minecraft servers
http://msmhq.com
GNU General Public License v3.0
1.22k stars 225 forks source link

Detect CPU & RAM (memory) usage #135

Open F481 opened 11 years ago

F481 commented 11 years ago

Are there any possibilities to detect the cpu and memory usage of a running minecraft server like in mcmyadmin?

mcmyadmin

TrogloGeek commented 11 years ago

AFAIK their is no out of the box solution to do this with MSM, but your goal can easily be achieved by adding a new command line method which would either output or log CPU and RAM usage for a given server|all. You just have to use server_pid() to get server's parent PID (not it's PID itself, but the PID of the screen session that holds the server process) and then call ps with the right options to get instant informations about resource usage by the server's process. Use it with a performance data collection program (Cacti for exemple) and you're done.

Is that what you're looking for ?

F481 commented 11 years ago

First of all thanks for the answer. Are you sure that ps is the right way to do this?

Take a look at http://virtualthreads.blogspot.de/2006/02/understanding-memory-usage-on-linux.html

Why ps is "wrong" Depending on how you look at it, ps is not reporting the real memory usage of processes. What it is really doing is showing how much real memory each process would take up if it were the only process running. Of course, a typical Linux machine has several dozen processes running at any given time, which means that the VSZ and RSS numbers reported by ps are almost definitely "wrong". In order to understand why, it is necessary to learn how Linux handles shared libraries in programs.

TrogloGeek commented 11 years ago

First, thanks for the link, it was interesting and learnd me things.

I can't answer to you, it depends on your needs...

I watched output of pmap for my main Minecraft instance:

mapped: 4898672K    writeable/private: 4719668K    shared: 2400K

Not that much difference between mapped and private memory. Not much difference with ps output either:

root@ns204173:~# ps -p 2617 -o %cpu,%mem,rss,vsz
%CPU %MEM   RSS    VSZ
 8.0 13.6 4472380 4898668

I run minecraft+forge on openjdk-6-jdk, I don't know if it changes anything... In my case, using ps looks to be okay.

TrogloGeek commented 11 years ago

After restart of minecraft server:

mapped: 4881748K    writeable/private: 4695932K    shared: 2400K
root@ns204173:~# ps -p 18714 -o %cpu,%mem,rss,vsz
%CPU %MEM   RSS    VSZ
18.0  1.5 495576 4881744

So I would definitely use %MEM or RSS output (it may exists a better way but I don't know it).

TrogloGeek commented 11 years ago

I exposed "server pid" method of msm script, and added a ps method with customization columns. https://github.com/TrogloGeek/minecraft-server-manager/commit/d4a613da3c9e27c69665885c236c96da61d79420 I'll need to monitor resource usage too, so if you have a better idea I'm interested as well ;-).

F481 commented 11 years ago

wow, thanks for the fast work on it. I'll do some research and testing at the weekend, so you'll get feedback after that.

F481 commented 11 years ago

Hm.. in my case the %CPU value is a bit confusing too. I started a minecraft server and get the resource usage with

ps -p 3495 -o %cpu,%mem,rss
%CPU %MEM   RSS
37.0  5.3 394884 

I monitored the resource usage with top at the same time. After a while the cpu usage in top is jumping from 4-8%, but the ps command still gives me 36.3%. The ps cpu usage is getting lower very very slow (I don't think that's the actual usage). After 25 minutes the ps command give nearly the same value as top for the cpu usage value.

Alternative idea: Try instead top -b -n1 | grep PID. What do you think about that?

TrogloGeek commented 11 years ago

Hi, I see last response I posted hasn't been recorded right, so sorry for the delay...

Indeed, ps gives us an average CPU usage since the process has been started (dividing CPU usage time by age of the process and multiplying by 100). As far as I know, top does the same unless we let it refresh at last one time, in which case it is able to work on deltas which still gives us an average CPU usage but this time calculated between now and last refresh loop, so yes it's far better, but if I don't do mistake top -b -n1 -p <pid> does the same mistake on some nix distributions (but I'm not really sure, if anyone has strong nix culture I'd like to know his opinion).

Adding a top command would be great when called when called by an user watching output, but probably not in batch mode as it does a lot of processing (general tasks, cpu and memory stats) which I find useless for a logging purpose (at least if called in a per server basis).

What about this: (msm <server> ps lstart | tail -1; while msm <server> status | grep 'is running' > /dev/null; do echo -n "$(date +%s) "; msm <server> ps cputime,vsz,rss | tail -1; sleep 1s; done) > <logfile> & where is obviously the monitored server (not all !) I recognize this is far from perfect as gives us three different date formats to parse, but it allows the logfile to be used to build a (possibly web) GUI printing a memory and cpu usage graph showing L-1 dots, L being the line count in the file, provided that the file change if we restart the server.

TrogloGeek commented 11 years ago

As having a good precision here would involve (if I don't do mistakes, as said before I'm still a beginner in *nix maintenance) playing with /proc/<pid>/stat file, I think it would be better to rely on a strong external solution to monitor a process resource consumption, after asking msm for the server pid (invoked command pid given by msm <server> ps pid, not screen pid as returned by msm <server> pid) as we would need a better precision than seconds as to my mind interesting information would be to know how often server cpu usage was close to cpu frequency, highlighting cpu's bottlenecks.

TrogloGeek commented 11 years ago

In (msm <server> ps lstart | tail -1; while msm <server> status | grep 'is running' > /dev/null; do echo -n "$(date +%s) "; msm <server> ps cputime,vsz,rss | tail -1; sleep 1s; done) > <logfile> & the sleep 1s is a nonsense, the high school professor that tried to learn me capture resolution 101 would probably kill me for such a mistake: as the precision of cputime is hardly on second, sleep 4s (at least) would be more realistic and we wouldn't loose much information.

F481 commented 11 years ago

http://stackoverflow.com/a/1424556/1521984