Open xxxserxxx opened 4 years ago
Tagging (hopefully) @ionlights, @talentlessguy, @gonza21gd
The next major release will be v3.4.0.
I am willing to help with NVIDIA and Intel GPUs testing.
One problem is that I have a rather old dGPU card, so I can only test temperature sensors and memory, not "GPU Processes".
my nvidia-smi
output rn:
Tue Feb 25 21:35:21 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59 Driver Version: 440.59 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GT 740M Off | 00000000:01:00.0 N/A | N/A |
| N/A 53C P8 N/A / N/A | 10MiB / 2004MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
I'm definitely willing to test this. What information are you trying to provide via gotop
?
Also, I can only test NVIDIA GPUs. Here's what I've got below:
$ nvidia-smi
Tue Feb 25 13:40:38 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59 Driver Version: 440.59 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:0A:00.0 Off | N/A |
| 32% 35C P5 14W / 280W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:43:00.0 Off | N/A |
| 0% 43C P5 25W / 280W | 0MiB / 11178MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I think it 'd be useful to have a per-process and an aggregate layout (potentially toggle-able): Per-process, I find the following most useful (ranked):
In aggregate, I find the following most useful (ranked):
I'm also willing to help on this. Got intel graphics (7700K) and AMD GPU (Vega 64) available.
v3.4.0 will (likely) include only NVIDIA support; Intel and AMD are going to be more work. The current build does not include the GPU code, but does have other features I'd like to make sure work for everybody. I'm linking to Linux, Windows, and Darwin AMD64 tarballs -- if anyone needs a different binary, let me know.
The downloads will be available until Saturday. You can also build it yourself, as usual.
The features are listed in the 3.4.0 milestone: metrics export and power gauge are the big ones. I'm interested in knowing if anybody sees significantly increased CPU use or any issues.
There are a couple of known issues listed in the milestone. Thanks!
The battery widget seems to work fine for my laptop, though I don't understand what the numbers on the right mean (screenshot). I've tried plugging and unplugging multiple times, as well letting the battery discharge to 10% and charging it up again, and it's not having any trouble keeping track of it. I think reducing its max size would be a nice touch: if you have the terminal full screen this widget will take a lot of space, and you probably don't need to see such a big graph for a simple xx% and a increasing/decreasing line (this is costing space in the CPU usage graph, and that history looks way interesting to me).
I'm not getting any weird CPU performance issues/reports in this version, nor any other kind of issue so far (apart from the already marked bugs).
Sadly, I couldn't be able to test the export option, I'd have to modify some settings in my network so I hope the other testers can check this out.
Also, the --percpu option doesn't seem to change anything for me. Isn't it the default and only option when launching the program already? I just noticed this now, but you probably knew this already.
PS: I'm really looking forward the GPU widget. I saw you're having problems with it. I'd be glad to help you in any way I can :)
@Gonza21Gd, the battery widget hasn't changed in a number of versions and was available on Caleb's repo -- could you try the new "power" widget? I see you've found the layouts feature.
Would you mind filing a bug about the --percpu
weirdness? That's probably a new bug, and there's no existing issue filed for it that I've seen.
Ok guys, here's the first drop. There are a number of caveats, so please read first.
gotop
program.gotop
is still a single executable (and I will not change that), but extensions need to be put in places and configurations need to be configured for non-dev users. This needs to be worked out before release.nvidia-smi
tool. On Arch, this is provided by nvidia-utils
.Ok, here's what you're going to do:
nvidia-smi
installedgotop-test
echo "cpu\ntemp\nmem" | ./gotop -X nvidia -l -
Verify that you're getting CPU, memory, and temperature diagnostics.
If this looks good, go to the next phase.
nvidia.so
into $XDG_CONFIG_DIR/gotop
$XDG_CONFIG_DIR/gotop/gotop.conf
extensions=nvidia
gotop-test
, run gotop
. You'll have to move the exe, or call it by path, e.g.: cd .. ; ./gotop-test/gotop
Check that you have diagnostics.
Also in the directory is a dummy.so
plugin. This is an example of a virtual device that generates random CPU, memory, and temp data. It just draw crazy lines; you can ignore it, or play with it. To enable it, use -X dummy
. To enable both extensions, use -X nvidia,dummy
(or use the config file extensions=nvidia,dummy
.
The tarball is here.
Incidentally,
Also, because of the plugins and modules, I'm going to have to start properly versioning things so that plugins can be built and work properly.
@xxxserxxx going to test on my GPU asap!
also I think it is better to move this thread to discussions, to quickly exchange results and reference bugs that occurred before
@talentlessguy "discussions?"
@xxxserxxx nevermind, just checked, it is team only
@xxxserxxx
gotop
crashes after launching with extensions:
./gotop -X nvidia
it hangs for a sec with an empty stdout and then exits with "false" signal
Using this config:
extensions=nvidia
it doesn't even launch
/tmp/gotop-test
➜ ls ~/.config/gotop
gotop.conf nvidia.so
/tmp/gotop-test
➜ ../
/tmp
➜ gotop
error parsing config file invalid config option extensions
Thank you! Can you post the contents of $XDG_STATE_HOME/gotop/errors.log
?
Do I need to tag you for you to get emails @talentlessguy?
@xxxserxxx I don't have $XDG_STATE_HOME
variable. Where else can be logs stored?
Response to the edit: sure, why not
@talentlessguy do you have a ~/.local/state/gotop
directory?
I assume gotop-test
is from the tarball? The error you're getting about the config file indicates that you somehow have the wrong executable, so maybe what's in the tarball is bad. Just to check, can you please:
tar -xzf gotop-test.tgz
cd gotop-test
./gotop -X dummy
If this works (you'll see extra CPU, memory, and temperatures that bounce around randomly), then try
./gotop -X nvidia
@xxxserxxx yes, I have, here's the output:
17:56:17 cpu.go:30: GeForce GT 740M 0: strconv.Atoi: parsing "N/A": invalid syntax
panic: assignment to entry in nil map
goroutine 1 [running]:
github.com/xxxserxxx/gotop-nvidia.updateNvidiaMem(0xc0002da690, 0x0)
/home/ser/workspace/gotop-nvidia/nvidia.go:53 +0x239
github.com/xxxserxxx/gotop/devices.UpdateMem(0xc0002da690)
/home/ser/workspace/gotop/devices/mem.go:19 +0x84
github.com/xxxserxxx/gotop/widgets.NewMemWidget(0x3b9aca00, 0x7, 0xb79bdb)
/home/ser/workspace/gotop/widgets/mem.go:29 +0x1c6
github.com/xxxserxxx/gotop/layout.makeWidget(0xc000026ee0, 0x19, 0xc000026ea0, 0x1e, 0xb7c238, 0xa, 0x7, 0x0, 0x0, 0x0, ...)
/home/ser/workspace/gotop/layout/layout.go:170 +0xae
github.com/xxxserxxx/gotop/layout.processRow(0xc000026ee0, 0x19, 0xc000026ea0, 0x1e, 0xb7c238, 0xa, 0x7, 0x0, 0x0, 0x0, ...)
/home/ser/workspace/gotop/layout/layout.go:117 +0x40e
github.com/xxxserxxx/gotop/layout.Layout(0xc00024c2a0, 0x4, 0x4, 0xc000026ee0, 0x19, 0xc000026ea0, 0x1e, 0xb7c238, 0xa, 0x7, ...)
/home/ser/workspace/gotop/layout/layout.go:41 +0x184
main.main()
/home/ser/workspace/gotop/cmd/gotop/main.go:403 +0x5ff
Lol, so I see /home/ser
. Paths are hardcoded? Shouldn't be like that
Dummy works
Lol, so I see /home/ser. Paths are hardcoded? Shouldn't be like that
Haha, no, there are no hard-coded paths in the source. It's an artifact of Go; that's the path I was in when I compiled the executable you're running.
Try this; you'll see that panic()
includes the path in the trace.
Thanks for the trace, though; it gives me a starting point.
I can test on Raspberry Pi's, Raspbian/ARM or Ubuntu/ARM.
@masonj188 do you want binaries? Edit: let me rephrase that. For which ARM would you like binaries? 5, 6, 7, or aarch64? There's no nvidia in question, but would you test the plugin system if I gave you a dummy plugin?
@xxxserxxx Yep I can test the dummy plugin. I only have Pi 3Bs and 4s, so aarch64.
@masonj188 : https://gofile.io/?c=zQml3u
It's just the executable; I just discovered I don't have the tools installed to cross-compile shared libraries for aarch64. However, once I figure it out, I'll upload an .so
and you can use the same gotop executable.
@masonj188 : https://gofile.io/?c=WDTtik
That's dummy-arm.so
. Put it in the directory that you're running gotop from, and use -X dummy-arm
to enable the plugin. You should see some extra CPU, temp devices, and memory that will fluctuate wildly. If you want to limit the display to just those widgets, call:
echo "mem\ncpu\ntemp" | ./gotop -X dummy-arm -l -
I didn't realize Raspbian is built for ARMv7, so I just cloned the repo and built it on a RPI4. No temperature sensors show up. The storage device and mounts show up correctly, but it doesn't seem like its actually monitoring the real time read and writes, the values just stayed at 0B. Other than that the network usage seems good, processes seem good, and it showed all of the CPUs and correct amount of ram.
If you want to compile another .so for ARMv7 I'll test that as well.
After I test the ARMv7 .so I can install a distro that supports aarch64 and test it using the binary and .so you already linked.
Thanks @masonj188 . Can you upload ~/.local/state/gotop/errors.log
?
ARMv7 RPI4 errors.log - https://gofile.io/?c=KRMpZB
@talentlessguy , @ionlights
As mentioned in the update to the extensions wiki page, I'm taking an interrim approach to supporting extensions for non-ubiquitous devices such as the NVidia card. Testers with NVidia, would you be so kind as to use the extension builder tool to build a version of gotop
with the NVidia extension, and then test the extension? I'm interested in two things:
The instructions for the builder are sparse, but I believe sufficient. Please file tickets in that project for improvements, and issues with the NVidia extension here (for now).
Hey, @xxxserxxx sorry for the late reply/lack thereof. I believe I'll have some time in the coming week or so to give this a shot on an Nvidia system. Are you expecting it to work strictly with a single GPU or should it handle multiple?
If only a single, how do you parse the GPUs – as I have two present?
@ionlights It should work with multiple GPUs; it relies on the library to return the data, though, so if the library returns a single GPU it'll display a single GPU.
Honestly, without an NVidia card, I have to trust to the API, and to testers like yourself to provide feedback. Let me know if and how it runs.
Folks, the v3.6.0 release is in soak, so please have at it and let me know if there are any issues. The change log is fairly substantial on this one; one thing I'm considering is bumbing the major version to 4. The reason is because the CLI arguments have evolved significantly. I'm interested in feedback.
I'm going to set a release date target on the milestone now; if you have a chance, please give it a spin.
@xxxserxxx hi, is there any fast way to build gotop + NVIDIA plugin? So I can instantly build it and send feedback, without having to download anything from a browser
@talentlessguy , assuming you're on Linux and already have Go installed, you can try this:
mkdir gotop-nvidia
cd gotop-nvidia
curl -OL https://raw.githubusercontent.com/xxxserxxx/gotop-builder/master/build.go
go run ./build.go -r v3.5.1 github.com/xxxserxxx/remote
go build -o gotop-nvidia -ldflags "-X main.Version=v3.5.1-nvidia" ./gotop.go
If everything went well, you now have an executable gotop-nvidia
that you can test. In the v3.5
branch, the errors.log
file should be in ~/.config/gotop/errors.log
(if you're on a system that uses XDG).
Please let me know how this works for you! Part of this is also working out issues in the builder, which is there until improvements in Go's plugins are implemented.
@xxxserxxx I get this error:
➜ go run ./build.go -r v4.0 github.com/xxxserxxx/remote
go: errors parsing go.mod:
/tmp/gotop-nvidia/go.mod:3: no matching versions for query "v4.0"
Aw crap. Sorry, I copied and pasted the wrong line. Change v4.0
to v3.5.0
. Do that in the next line as well; change v4.0.0
to v3.5.0
. That latter one doesn't matter as much (it's just a label) but it'll be less confusing when you report errors.
@talentlessguy Hey, try it with v3.5.1 first. It should work, and that has bug fixes.
Hey @xxxserxxx – sorry for the late responses. I just got this all running; though I ran into a weird error compiling with the -ldflags
you specified.
Regardless, the output of gotop-nvidia
are here:
Regular gotop
:
Definitely appreciate the inclusion of GPUs! Now I can use this, mostly, without nvidia-smi
. 😃 Thanks for all the effort.
Hiya @ionlights -- I have questions --
ldflags
instructions?Considering it was programmed blindly, I'm a little shocked that it actually works. @talentlessguy, are you able to test this? If I can get a second motion, I'll update the CI/CD to build gotop-nvidia binaries -- mainly for testing, but for folks who only want that one extension, it'll make upgrading a tiny bit easier.
- So, it works?
Yes. 😁
- Why does your non-NVidia gotop have no temps? Has it always had none?
I believe it never has. However, I'm on an Arch Linux machine with an AMD 1920x (TR) – so I could be missing sensors as well. I wouldn't totally take my word for it, though, as I typically run gotop -psm
because I didn't need to track my CPU temps as much as GPU temps.
gotop
from the AUR.)
- I've lost track -- was it yours that only gave you temps, and not GPU usage %? Or is that a missing metric in gotop-nvidia?
This is my first build of gotop-nvidia
. So I would venture a guess that this doesn’t apply. However, I would definitely appreciate more data like the numeric VRAM and GPU % usage, though.
- It works? Like, really?
Yes, the temps are accurately reflected.
- Does nvidia-smi give you any other sensors you don't see in gotop-nvidia?
If, by sensors, you mean temperatures – no. If, by sensors, you mean additional data, then yes. For the latter nvidia-smi
states % GPU usage, numeric VRAM usage, processes and their VRAM usage, and power utilization – just to name a few. (Below, I've highlighted the most useful things.)
- What were the mistakes I made in the
ldflags
instructions?
So, I just tried rebuilding with the commands you gave and it all worked with ldflags
. Honestly not sure what's changed in my machine's state. 😕 However, this could just be a fluke as my desktop has had some semi-inconsistent behavior with a variety of go-based software (e.g. Docker).
Also, I wasn’t able to use the -X
system, before. I’m not sure if you intend to keep it setup like that, but I would definitely appreciate that, rather than having alternate programs for each configuration.
It might make sense to have additional modules/extensions in the AUR (e.g.) as gotop-nvidia
, but then it moves the extension pack into the expected location for gotop
– so as to avoid having people build or otherwise delineate between pure gotop
and it's plugin variants.
Ok, so there are two curiosities. This is the first I'm hearing of a Linux machine that doesn't report thermal sensors. I'm going to have to think about how I can check whether your machine just doesn't have them, or if it's a gotop issue.
The second issue is that gotop-nvidia should be providing at least some of the metrics that nvidia-smi does; in particular, it should be reporting thermal, memory, and GPU activity (% busy) -- there's code for those three. If it isn't working, it may be an issue in the code. I'm unable to test it, though, since none of the computers in this house have NVidia cards, so I may have to send you a program to see what gets dumped out.
As for Go plugins, we'll have to wait and see what the Go team does. The plugins interface is not particularly useful in its current state. However, the builder
program can support multple extensions at the same time -- you would just provide all of the extensions you want to have compiled on the command line. For example, to enable NVidia and the Remote devices, you'd say:
$ go run ./builder.go -r v4.0.0 \
github.com/xxxserxxx/gotop-nvidia github \
github.com/xxxserxxx/gotop-remote
So you don't need a per-extension binary. The main downside of the builder is that it does require people to have Go installed, and to build their own executables.
@talentlessguy Hey, try it with v3.5.1 first. It should work, and that has bug fixes.
Going to try soon and report the results :)
@talentlessguy it's a long thread, so I just want to make sure: you have to build it yourself with the builder. If you have any questions about that, you can find some instructions a bit further up, but I'll happily repeat them if it's not clear.
@xxxserxxx Yea, so, I've had lm_sensors
(on Arch) installed for quite a while and also just redid the configuration. No changes to gotop
's temp readout. 😕 What systems have this worked on? Also, do you know if the reports of gotop
properly reporting were on consumer hardware vs prosumer? (Latter being, e.g. Threadripper 1920X.) If that's the dividing line, then you may have your culprit. 😅
As for nvidia-smi
output, all I get is temps. 😕
Understood, on the plugins-end. Didn't realize it was a limitation of go
, first and foremost. 🙂
@xxxserxxx This is what I get:
zoomed to temp sensors:
temperature sensors work!! yaaay
it reports everything properly
I'm running gotop
on mac. it crashes all the time. I usually start gotop
and htop
side by side in a tmuch window, and whenever I come back after some time, gotop
crashed. it doesn't leave any crashdumps behind, and the error log only repeats those 2 lines:
18:03:09 proc_other.go:23: failed to retrieve processes: failed to execute 'ps' command: signal: killed
18:03:09 net.go:54: failed to get network activity from gopsutil: signal: killed
I think I did find something that can help. apparently gotop is killed by system because it exceeds maximum of 150 wakeups per second:
Date/Time: 2021-09-26 09:41:23 +0200
End time: 2021-09-26 09:41:58 +0200
OS Version: Mac OS X 10.15.7 (Build 19H1217)
Architecture: x86_64h
Report Version: 29
Incident Identifier: D3DD1408-F47F-45B8-BAA0-B1DA73171C02
Data Source: Microstackshots
Shared Cache: 0x697d000 55D03F0F-F7A8-3655-964D-17AF0242F780
Command: gotop
Path: /usr/local/Cellar/gotop/4.1.2/bin/gotop
Version: ??? (???)
PID: 94669
Event: wakeups
Action taken: none
Wakeups: 45001 wakeups over the last 35 seconds (1279 wakeups per second average), exceeding limit of 150 wakeups per second over 300 seconds
Wakeups limit: 45000
Limit duration: 300s
Wakeups caused: 45001
Wakeups duration: 35s
Duration: 35.20s
Duration Sampled: 7.02s
Steps: 3
Hardware model: MacBookPro16,1
Active cpus: 16
Boot args: chunklist-security-epoch=0 -chunklist-no-rev2-dev chunklist-security-epoch=0 -chunklist-no-rev2-dev
Heaviest stack for the target process:
3 runtime.park_m + 157 (gotop + 282301) [0x4044ebd]
3 runtime.schedule + 727 (gotop + 280887) [0x4044937]
1 runtime.findrunnable + 3954 (gotop + 278066) [0x4043e32]
1 runtime.netpoll + 619 (gotop + 226923) [0x403766b]
Powerstats for: gotop [94669] [unique pid 73991828]
UUID: 0C77283B-59AF-3936-89A4-814195C4942D
Path: /usr/local/Cellar/gotop/4.1.2/bin/gotop
Architecture: x86_64
Footprint: 6352 KB -> 6740 KB (+388 KB)
Pageins: 1 pages
Start time: 2021-09-26 09:41:46 +0200
End time: 2021-09-26 09:41:53 +0200
Num samples: 3 (100%)
Primary state: 2 samples Non-Frontmost App, Non-Suppressed, User mode, Effective Thread QoS Default, Requested Thread QoS Default, Override Thread QoS Unspecified
User Activity: 0 samples Idle, 3 samples Active
Power Source: 0 samples on Battery, 3 samples on AC
3 runtime.park_m + 157 (gotop + 282301) [0x4044ebd]
3 runtime.schedule + 727 (gotop + 280887) [0x4044937]
1 runtime.findrunnable + 3954 (gotop + 278066) [0x4043e32]
1 runtime.netpoll + 619 (gotop + 226923) [0x403766b]
1 runtime.findrunnable + 3242 (gotop + 277354) [0x4043b6a]
1 runtime.runqsteal + 82 (gotop + 309138) [0x404b792]
1 runtime.findrunnable + 1838 (gotop + 275950) [0x40435ee]
1 runtime.stopm + 146 (gotop + 270514) [0x40420b2]
1 runtime.mPark + 57 (gotop + 264729) [0x4040a19]
1 runtime.notesleep + 219 (gotop + 70107) [0x40111db]
1 runtime.semasleep + 141 (gotop + 227629) [0x403792d]
1 runtime.pthread_cond_wait + 57 (gotop + 375577) [0x405bb19]
1 runtime.asmcgocall + 173 (gotop + 463853) [0x40713ed]
1 runtime.pthread_cond_wait_trampoline + 16 (gotop + 472144) [0x4073450]
1 __psynch_cvwait + 10 (libsystem_kernel.dylib + 14466) [0x7fff6dd13882]
1 <Kernel mode>
Binary Images:
0x4000000 - 0x460ffff gotop (0) <0C77283B-59AF-3936-89A4-814195C4942D> /usr/local/Cellar/gotop/4.1.2/bin/gotop
0x7fff6dd10000 - 0x7fff6dd3cfff libsystem_kernel.dylib (6153.141.33) <174BBB20-B300-385A-A443-B1A4DD65748C> /usr/lib/system/libsystem_kernel.dylib
it exceeds maximum of 150 wakeups per second
Which version of gotop is this? Could you run goto --version
for me?
This appears to be common on OSX. I find a lot of references to this problem on OSX, across environments: Ruby programs, Safari extensions, Chrome, Unity, gomobile on iOS, and C. What's interesting is that it's always "45001 wakeups". In any case, I suspect this is related to (but not necessarily caused by) the termui
library, and in particular the select
loop gotop uses to watch for keyboard events (among other things).
I ask about the version of gotop, because I'd like to know if this is a regression. I've changed a lot of code in the headless
branch, although the termui
package hasn't been changed since last year.
gotop 4.1.2 (20210721T071531)
Looking for volunteers to beta-test features in the next release. I'll provide binaries. Please respond if you're willing to help in any capacity.
Edit
If gotop crashes or exits with any other value than 0, please either attach (if long) or copy/paste (if short) the contents of
~/.local/state/gotop/errors.log
.Also, pro tip: the text UI library can leave a console in a bad state if gotop crashes. In most terminals, you can fix this by typing
reset
, even if you can't see the cursor. HitEnter
a couple of times, and blind-typereset
and you should get back to normal.