ylecuyer / puma-helper

Command-line too for puma to display information about running request/process ...
MIT License
2 stars 0 forks source link

Rework the puma-helper output #18

Open spuyet opened 5 years ago

spuyet commented 5 years ago

I think that we should rework thepuma-helper output, the current output is really hard to read and we should use passenger-status as example:

$ sudo passenger-status
Version : 5.3.4
Date    : 2019-05-06 13:02:07 +0200
Instance: S0h5vRIG (nginx/1.14.0 Phusion_Passenger/5.3.4)

----------- General information -----------
Max pool size : 200
App groups    : 2
Processes     : 31
Requests in top-level queue : 0

----------- Application groups -----------
/home/smcc/api_current (production):
  App root: /home/smcc/api_current
  Requests in queue: 0
  * PID: 35276   Sessions: 0       Processed: 4610    Uptime: 2h 11m 6s
    CPU: 2%      Memory  : 315M    Last used: 0s ago
  * PID: 44452   Sessions: 0       Processed: 399     Uptime: 1h 2m 36s
    CPU: 0%      Memory  : 260M    Last used: 1s ago
  * PID: 37492   Sessions: 0       Processed: 16      Uptime: 11m 38s
    CPU: 0%      Memory  : 153M    Last used: 1s ago

VS

$ puma-helper status
WARN[May  6 17:19:42] [lds] configuration is invalid. Error: Get http://unix/stats?token=768935b7ed117cfac6c8c99da4db6bd: dial unix /tmp/puma-status-1545126439210-2886: connect: no such file or directory

---------- Global informations ----------
Version: 1.0.0 | Date: Mon, 06 May 2019 17:19:42 +0200

----------- Application groups -----------
-> answers_integration application | App root: /home/answers_integration
-> 38423 (/home/answers_integration/current/tmp/pids/puma.state) Phase: 28 | Workers: 2 | Active threads: 0[░░]2

* 0 [27771] CPU Av: 0.3% CPU Times: 12s Mem: 230.3MiB Phase: 28 Uptime: 11m29d23h Threads: 0[░]1 (Last checkin: 0s)
* 1 [27949] CPU Av: 0.2% CPU Times: 9s Mem: 230.3MiB Phase: 28 Uptime: 11m29d23h Threads: 0[░]1 (Last checkin: 5s)

-> 38424 (/home/answers_integration/current/tmp/pids/puma_api.state) Phase: 28 | Workers: 2 | Active threads: 0[░░]2

* 0 [27892] CPU Av: 0.1% CPU Times: 5s Mem: 172.1MiB Phase: 28 Uptime: 11m29d23h Threads: 0[░]1 (Last checkin: 3s)
* 1 [27961] CPU Av: 0.1% CPU Times: 5s Mem: 171.4MiB Phase: 28 Uptime: 11m29d23h Threads: 0[░]1 (Last checkin: 2s)

Here is a first draft of something a bit readable and largely inspired from the passenger output:

Version : 1.0.0
Date:   : Mon, 06 May 2019 13:02:49 +0200

----------- General information -----------
App groups   : 2
Processes    : 6

----------- Application groups -----------
smcc_api (20190419144610):
  * PID: 35276    Queued: 0       Phase  : 42     Uptime      : 2h 11m 6s
    CPU: 2%       Memory: 315M    Threads: 3/5    Last checkin: 0s ago
  * PID: 44452    Queued: 0       Phase  : 42     Uptime      : 1h 2m 36s
    CPU: 0%       Memory: 260M    Threads: 1/5    Last checkin: 3s ago
  * PID: 37492    Queued: 0       Phase  : 42     Uptime      : 11m 38s
    CPU: 0%       Memory: 153M    Threads: 2/5    Last checkin: 1s ago

smcc (20190419144610):
  * PID: 20487    Queued: 0       Phase  : 42       Uptime      : 1h 57m 57s
    CPU: 24%      Memory: 456M    Threads: 12/20    Last checkin: 1s ago
  * PID: 32417    Queued: 0       Phase  : 42       Uptime      : 1h 42m 49s
    CPU: 27%      Memory: 463M    Threads: 5/20     Last checkin: 9s ago
  * PID: 34253    Queued: 0       Phase  : 42       Uptime      : 1h 40m 13s
    CPU: 25%      Memory: 497M    Threads: 19/20    Last checkin: 0s ago    

I know that the go dependency that you're using doesn't support current CPU percentage and to be honest portability is a welcome feature but current CPU usage is a mandatory one IMHO, a software cannot be restricted only because of portability mostly for a tiny project like this one.

This is a first draft, comments and improvements are welcome.

Notes: CPU percentage => use /proc/PID/stat Uptime => use /proc/PID/stat too Queued => corresponding to the backlog value per process in puma stats

jarthod commented 5 years ago

Yep, I don't agree with the two lines per process though, I'm the one who asked for one because we have a lot in production and it means you can see twice as much in one console screen.

Also @w3st3ry you (still) didn't take into account half of the things I asked in my last feedback (which I can't see any more because I sent it via PM in Slack), I gave you the benefit of the doubt but I guess by now I can consider you will never do them. This is not acceptable, you can't waste our time like that. You need to at least read what we say, and if you don't want to do something it must be written somewhere, with arguments, you can't just ignore.

So OK I'll spend some time again to try to give you (mostly) the same feedback, please do it this time:

Example output (made quickly by hand):

$ puma-helper status
Version: 1.0.0 | Date: Mon, 06 May 2019 17:19:42 +0200

38423 answers_integration           Phase: 28  Active threads: 0/2 [░░]
└ 27771 CPU: 98.4%  Mem: 1230MiB  Uptime:       9d 3h Threads: 0/1 [░]
└ 27949 CPU:  0.2%  Mem:  230MiB  Uptime: 11m 29d 23h Threads: 0/1 [░] (Last checkin: 13s) (Phase: 27)

38424 answers_integration API       Phase: 28  Active threads: 0/2 [░░]
└ 27892 CPU:  0.1%  Mem:  172MiB  Uptime: 11m 29d 23h Threads: 0/1 [░]
└ 27961 CPU:  0.1%  Mem:  171MiB  Uptime: 11m 29d 23h Threads: 0/1 [░]

WDYT @spuyet @ylecuyer?

ylecuyer commented 5 years ago

This output looks better ;)

I would like to add something about errors too. Currently errors are on the top of the output

$ puma-helper status
WARN[May  6 17:19:42] [lds] configuration is invalid. Error: Get http://unix/stats?token=768935b7ed117cfac6c8c99da4db6bd: dial unix /tmp/puma-status-1545126439210-2886: connect: no such file or directory

---------- Global informations ----------
Version: 1.0.0 | Date: Mon, 06 May 2019 17:19:42 +0200

We should have them at the end if any otherwise it is difficult to notice them

$ puma-helper status
Version: 1.0.0 | Date: Mon, 06 May 2019 17:19:42 +0200

38423 answers_integration           Phase: 28  Active threads: 0/2 [░░]
└ 27771 CPU: 98.4%  Mem: 1230MiB  Uptime:       9d 3h Threads: 0/1 [░]
└ 27949 CPU:  0.2%  Mem:  230MiB  Uptime: 11m 29d 23h Threads: 0/1 [░] (Last checkin: 13s) (Phase: 27)

38424 answers_integration API       Phase: 28  Active threads: 0/2 [░░]
└ 27892 CPU:  0.1%  Mem:  172MiB  Uptime: 11m 29d 23h Threads: 0/1 [░]
└ 27961 CPU:  0.1%  Mem:  171MiB  Uptime: 11m 29d 23h Threads: 0/1 [░]

[lds] configuration is invalid. Error: Get http://unix/stats?token=768935b7ed117cfac6c8c99da4db6bd: dial unix /tmp/puma-status-1545126439210-2886: connect: no such file or directory
spuyet commented 5 years ago
$ puma-helper status
Version: 1.0.0 | Date: Mon, 06 May 2019 17:19:42 +0200

38423 answers_integration        Load: 0/2 [░░]
 └ 27949 CPU:  0.2%  Mem:  230M  Uptime: 11m 29d 23h  Load: [░] (Last checkin: 13s) (Phase 27 != Phase app 28)
 └ 27771 CPU: 98.4%  Mem: 1230M  Uptime:       9d 3h  Load: [░]

38424 answers_integration_api     Queuing: 42          Load: 2/2 [░░]
 └ 27892 CPU:  99.8%  Mem:  172M  Uptime: 11m 29d 23h  Load: [░]
 └ 27961 CPU: 100.0%  Mem:  171M  Uptime: 11m 29d 23h  Load: [░]

[lds] configuration is invalid. Error: Get http://unix/stats?token=768935b7ed117cfac6c8c99da4db6bd: dial unix /tmp/puma-status-1545126439210-2886: connect: no such file or directory

Here is a little update:

About CPU percentage, I will test to do something with /proc/PID/stat to see if we can have something reliable

jarthod commented 5 years ago

Queuing is at worker level, no? not application?

spuyet commented 5 years ago

It's at worker level in puma stats yes, but It seems more interesting to me have a queuing overview and then to aggregate the value and display it at application level. WDYT ?

jarthod commented 5 years ago

Yep ok

w3st3ry commented 5 years ago

In order, to summarize all these changes:

Questions before adding in todo-list:

jarthod commented 5 years ago

About the CPU percentage, what's the relation with go routines and locks? you can iterate on all apps to get the first number then wait 1 second, and then get the second number for all apps, and then print. You don't have to over-engineer this.

w3st3ry commented 5 years ago

I just gave this kind of solution in first because it's more compliant with the philosophy of the language. BTW after some change it should works like you said too and it's more easy to maintain for someone is not a full-time Gopher, I agree.

Otherwise, did you agree about the summary? @jarthod @spuyet

jarthod commented 5 years ago

A couple typos but otherwise, yes:

Last checkin disabled if there's an issue (> 10 sec)

It's the opposite: visible if there's an issue

MiB to B

→ to M

About the queue I think it's "backlog" in the status json but I let @spuyet confirm.

About "Removed threads number per process" I think @spuyet said we should remove the number but keep the load bar graph for worker (because it's almost always 1) but I'm not sure it's a good idea as we can have multiple threads in which case it'll not be 1. @spuyet WDYT?

spuyet commented 5 years ago

"Added queuing" -> How I should get this information? I get nothing according to queuing in the worker Json, could you explain?

I've explained it in the body of this issue, that's the backlog value that you have per process in status json output yeah.

About "Removed threads number per process" I think @spuyet said we should remove the number but keep the load bar graph for worker (because it's almost always 1) but I'm not sure it's a good idea as we can have multiple threads in which case it'll not be 1. @spuyet WDYT?

Yep, I don't find the number really useful as we already have the global value, the load graph seems to be enough to me.

jarthod commented 5 years ago

Ok!

w3st3ry commented 5 years ago

Please checkout https://github.com/dimelo/puma-helper/pull/21 about the UI. Available on staging env.