Closed eicca closed 4 days ago
gpud scan
might have failed, thus dumping all the output to your terminal.
Could you share the error message for your gpud scan
command?
And regardless, https://github.com/leptonai/gpud/pull/33 will remove that verbose output from the scan command.
Thanks for the report.
The gpud scan
does all the steps well until the
⌛ scanning dmesg for 5000 lines
And then it errors there with:
gpud: failed to execute command: exit status 1 ([ 0.000000] kernel: Linux version blah blah and other kernel details
unfortunately, it doesn't really say why it fails or what kind of command was it.
On my local GPU machine I don't have such error.
On my local GPU machine I don't have such error.
@eicca Can you share your output gpud --version
?
We've fixed a few releases ago, and https://github.com/leptonai/gpud/releases/tag/v0.0.1-alpha7 has been released :)
Please give it a try and let us know if things are still breaking :)
Hi, I tried the new release and the kernel log doesn't fill the terminal anymore :+1:
However, the dmesg scan error is still there:
⌛ scanning dmesg for 5000 lines
{"level":"warn","ts":"2024-08-29T13:28:24Z","caller":"process/process.go:221","msg":"command exited with non-zero status","error":"exit status 1","cmd":"/usr/bin/bash /tmp/tmpbash3729988194.bash","exitCode":1}
{"level":"warn","ts":"2024-08-29T13:28:24Z","caller":"process/process.go:228","msg":"process exited with error","error":"exit status 1"}
exit status 1
Version:
gpud version v0.0.1-alpha7
@eicca Thanks for the confirmation.
I suspect this is an OS-specific error.
Could you share your output for
sudo dmesg --ctime --nopager --buffer-size 163920 --since '1 hour ago'
command?
(this is what the gpud scan command is running)
I get this error:
> sudo dmesg --ctime --nopager --buffer-size 163920 --since '1 hour ago'
dmesg: unrecognized option '--since'
I guess in this version of dmesg
this flag is not yet implemented.
> dmesg --version
dmesg from util-linux 2.34
On my local GPU machine with newer ubuntu it works well. The dmesg version there is dmesg from util-linux 2.39.3
I think one option would be to use a journalctl as a backup, something like this:
sudo journalctl -k --since "$(date --date='1 hour ago' '+%Y-%m-%d %H:%M:%S')" --no-pager
guess in this version of dmesg this flag is not yet implemented.
Oh makes sense. Will look into this, thanks for the confirmation!
@eicca We've just released https://github.com/leptonai/gpud/releases/tag/v0.0.1-alpha8.
Please let us know if you find any more issues :)
Closing. Please try https://github.com/leptonai/gpud/releases/tag/v0.0.1-alpha9 and feel free to reopen if there's still an issue.
Thanks @gyuho for the fix and sorry for the lack of reply! Will try a new version at some point!
Hi again :)
I'm running
And it works well, except that after displaying the results of the scan, it shows the kernel log from dmesg. This makes it hard to read the scan report, so I use just this for now:
It would be amazing if there would be an option to not display dmesg by default.
Thanks in advance :heart: