Open simonvanderveldt opened 3 years ago
Just found https://github.com/monome/norns/pull/1028 which if I understand it correctly seems to suggest this was a deliberate choice?
recall the Great Disk Full Catastrophe
recall the Great Disk Full Catastrophe
Well, that has/had to do with incorrectly configured logging/persistence/etc. I don't think we shouldn't be sending logs because of that.
yeah this was a bad less-than-ideal solution to "log rotation isn't working"
lol i just wanted to Coin An Era
agreed it's a bad solution. being able to request a user crash log would be extremely helpful
it is a simple enough change to revert.
the change wasn’t made to address log rotation problems.
the motivator was to provide the ability to do certain systems level debugging from maiden without the need for people to ssh and to reduce the additional cpu load due to additional blocking i/o being performed.
my impression based on past statements was that anyone heavily developing/debugging matron and crone was doing so by directly running those executables directly so at the time this change was made it seemed like it would not adversely affect existing workflows. sounds like that assumption is no longer true.
my suggestion would be to just revert the change.
thanks. sorry for being glib - the reasoning in the change makes sense.
tbh, i doubt that logging our processes would help much with diagnosing crashes. (beyond being able to say, yep that's a crash all right)
so personally, i'm actually OK with the way it is... maybe we should duplicate stderr
but not stdout
or something, and use stdout in matron more frequently? (because @simon we do acyually redirect both. i am not sure how the bonjour messages are getting in there, i assume from some other process.)
had some weird thoughts yesterday that seem relevant: https://github.com/monome/norns/issues/1266
there is a very high chance i've mistakenly mixed up my uses of stederr
and stdout
--- i can audit this if given some guidance.
AFAICT, every print in matron uses fprintf(stderr, ...)
i'm sure there was a silly reason for this, like not having to flush the stream for the bytes to show up in the ws wrapper
the change wasn’t made to address log rotation problems.
the motivator was to provide the ability to do certain systems level debugging from maiden without the need for people to ssh and to reduce the additional cpu load due to additional blocking i/o being performed.
Ah, OK.
How does the change in #1028 help with system level debugging? If we instruct users to get system logs by running journalctl
via the matron REPL then the logs from matron itself would be missing. That seems like the opposite of what we'd want, right?
as i said above i’d recommend just reverting the change. the reigning advice has always been to have people ssh into the device and run commands so the original change is counter productive.
OK. I'll create a PR
i'm adding it already, just testing on device..
the problem is with weird recursive/duplicated output when you try to use journalctl
within os.execute
in lua.
for example, from os.execute('journalctl -u norns-matron > ~/dust/matron_log.txt')
:
anyway we should just figure out a better method for doing things like that i guess
@catfact Are you seeing the output show up in the journal after adding the printf("%s", buf);
line back in?
When running /home/we/norns/build/ws-wrapper/ws-wrapper ws://*:5555 /home/we/norns/build/matron/matron
from the command-line I am seeing the matron logs but when running this via the systemd service I still don't see them :confused:
[edit] Just downloaded the log you posted and you are getting it in the journal, odd Does it keep looping like this indefinitely?
it gets truncated at some point for some reason. but this is why it seemed (to me) that the change was helpful in addressing our Great Log Disaster
The max journal size is capped at 100MB, so it won't ever take up more space than 100MB (it seem like it could actually take up less with some more config), so I don't think we need to worry about that part.
But of course getting the same log x times is not very helpful, need to think about how to handle that case. Simplest one is of course to ask users to use SSH for stuff like this.
requesting a log via ssh is a complex ask when trying to diagnose an issue from an end-user
can we simply symlink the log to the dust folder? so it could be viewed in maiden?
despite the log configuration, i feel it'd be due diligence to test the system and make sure logs aren't growing when the PR is merged
@tehn I think we can make the jounal config slightly better and safer, shall I make a PR for that?
that would be fantastic
i guess best to put any changes here: https://github.com/monome/norns-image/tree/master/config
Ok, will do. Will also think about a solution for this recursion problem
tbh i think the recursive problem can be solved by people not being forced to use os.execute
within maiden to get logs. (hence the symlink suggestion)
the way maiden loads files in full and caches them in the browser would prevent using maiden to browse for logs... updates to the file would not be viewable without refreshing the browser page. that is something which could be changed but it might be a deeper change.
tbh i think the recursive problem can be solved by people not being forced to use
os.execute
within maiden to get logs. (hence the symlink suggestion)
i don't think just linking to files is useful enough, i at least need tools like journalctl
and dmesg
to extract meaningful information from places like /run/log/journal/<long hex string>
. and yeah i wouldn't ask maiden to include log browsing capabilities.
anyways yeah, i think going back to the duplicated output seems best. (as it is, there's literally no way to capture output from matron process from early in launch, it is lost on the socket.) we should try be conscious of limiting and removing debug prints (pointing at myself.) and maybe add some shell scripts to dump helpful error messages from journalctl / dmesg / whatever.
(sat down attempting to try some things like that, but am having some kind of PEBCAK problem still, see comment on PR #1267).
@catfact is right, because the journal is made up of binary files there's no way to access it as a plain text file. I'm also not aware of any project that offers this functionality.
If we really want to streamline this maybe we should add a new socket activated service that gets all the logs and forwards them? Doesn't have to be complex, basically just get a full dump of the logs using something like journalctl -f -b --no-pager
and make the output from that it available over a websocket.
We don't get any output/logs from applications run via ws-wrapper, it seems like it's eating all the output.
For example for matron:
Now compare vs what we see in the journal:
It seems like stderr is still passing through but the rest is all missing. This makes troubleshooting pretty difficult.