monome / norns

norns is many sound instruments.
http://monome.org
GNU General Public License v3.0
630 stars 145 forks source link

No logs in journal from applications run via ws-wrapper #1265

Open simonvanderveldt opened 3 years ago

simonvanderveldt commented 3 years ago

We don't get any output/logs from applications run via ws-wrapper, it seems like it's eating all the output.

For example for matron:

$ systemctl stop norns-matron.service
$ /home/we/norns/build/matron/matron
MATRON
norns version: 0.0.0
git hash: 07cec1b6

fonts: 
... lots of output
*** WARNING *** The program 'matron' uses the Apple Bonjour compatibility layer of Avahi.
*** WARNING *** Please fix your application to use the native API of Avahi!
*** WARNING *** For more information see <http://0pointer.de/avahi-compat?s=libdns_sd&e=matron>
... some more output
available engines (7): 
  Decimatec
  MollyThePoly
  R
  PolyPerc
  YggdrasilSynth
  Timber_Takt
  None

Now compare vs what we see in the journal:

$ systemctl start norns-matron.service
$ journalctl -u norns-matron
-- Logs begin at Fri 2016-11-04 00:16:42 +07, end at Thu 2020-12-10 02:14:41 +07. --
...
Dec 10 02:06:48 norns systemd[1]: Started norns-matron.service.
Dec 10 02:06:49 norns matron[14674]: *** WARNING *** The program 'matron' uses the Apple Bonjour compatibility layer of Avahi.
Dec 10 02:06:49 norns matron[14674]: *** WARNING *** Please fix your application to use the native API of Avahi!
Dec 10 02:06:49 norns matron[14674]: *** WARNING *** For more information see <http://0pointer.de/avahi-compat?s=libdns_sd&e=matron>

It seems like stderr is still passing through but the rest is all missing. This makes troubleshooting pretty difficult.

simonvanderveldt commented 3 years ago

Just found https://github.com/monome/norns/pull/1028 which if I understand it correctly seems to suggest this was a deliberate choice?

tehn commented 3 years ago

recall the Great Disk Full Catastrophe

simonvanderveldt commented 3 years ago

recall the Great Disk Full Catastrophe

Well, that has/had to do with incorrectly configured logging/persistence/etc. I don't think we shouldn't be sending logs because of that.

catfact commented 3 years ago

yeah this was a bad less-than-ideal solution to "log rotation isn't working"

tehn commented 3 years ago

lol i just wanted to Coin An Era

agreed it's a bad solution. being able to request a user crash log would be extremely helpful

ngwese commented 3 years ago

it is a simple enough change to revert.

the change wasn’t made to address log rotation problems.

the motivator was to provide the ability to do certain systems level debugging from maiden without the need for people to ssh and to reduce the additional cpu load due to additional blocking i/o being performed.

my impression based on past statements was that anyone heavily developing/debugging matron and crone was doing so by directly running those executables directly so at the time this change was made it seemed like it would not adversely affect existing workflows. sounds like that assumption is no longer true.

my suggestion would be to just revert the change.

catfact commented 3 years ago

thanks. sorry for being glib - the reasoning in the change makes sense.

tbh, i doubt that logging our processes would help much with diagnosing crashes. (beyond being able to say, yep that's a crash all right)

so personally, i'm actually OK with the way it is... maybe we should duplicate stderr but not stdout or something, and use stdout in matron more frequently? (because @simon we do acyually redirect both. i am not sure how the bonjour messages are getting in there, i assume from some other process.)

had some weird thoughts yesterday that seem relevant: https://github.com/monome/norns/issues/1266

tehn commented 3 years ago

there is a very high chance i've mistakenly mixed up my uses of stederr and stdout--- i can audit this if given some guidance.

catfact commented 3 years ago

AFAICT, every print in matron uses fprintf(stderr, ...)

i'm sure there was a silly reason for this, like not having to flush the stream for the bytes to show up in the ws wrapper

simonvanderveldt commented 3 years ago

the change wasn’t made to address log rotation problems.

the motivator was to provide the ability to do certain systems level debugging from maiden without the need for people to ssh and to reduce the additional cpu load due to additional blocking i/o being performed.

Ah, OK. How does the change in #1028 help with system level debugging? If we instruct users to get system logs by running journalctl via the matron REPL then the logs from matron itself would be missing. That seems like the opposite of what we'd want, right?

ngwese commented 3 years ago

as i said above i’d recommend just reverting the change. the reigning advice has always been to have people ssh into the device and run commands so the original change is counter productive.

simonvanderveldt commented 3 years ago

OK. I'll create a PR

catfact commented 3 years ago

i'm adding it already, just testing on device..

catfact commented 3 years ago

the problem is with weird recursive/duplicated output when you try to use journalctl within os.execute in lua.

for example, from os.execute('journalctl -u norns-matron > ~/dust/matron_log.txt'):

matron_log.txt

catfact commented 3 years ago

anyway we should just figure out a better method for doing things like that i guess

simonvanderveldt commented 3 years ago

@catfact Are you seeing the output show up in the journal after adding the printf("%s", buf); line back in? When running /home/we/norns/build/ws-wrapper/ws-wrapper ws://*:5555 /home/we/norns/build/matron/matron from the command-line I am seeing the matron logs but when running this via the systemd service I still don't see them :confused:

[edit] Just downloaded the log you posted and you are getting it in the journal, odd Does it keep looping like this indefinitely?

catfact commented 3 years ago

it gets truncated at some point for some reason. but this is why it seemed (to me) that the change was helpful in addressing our Great Log Disaster

simonvanderveldt commented 3 years ago

The max journal size is capped at 100MB, so it won't ever take up more space than 100MB (it seem like it could actually take up less with some more config), so I don't think we need to worry about that part.

But of course getting the same log x times is not very helpful, need to think about how to handle that case. Simplest one is of course to ask users to use SSH for stuff like this.

tehn commented 3 years ago

requesting a log via ssh is a complex ask when trying to diagnose an issue from an end-user

can we simply symlink the log to the dust folder? so it could be viewed in maiden?

despite the log configuration, i feel it'd be due diligence to test the system and make sure logs aren't growing when the PR is merged

simonvanderveldt commented 3 years ago

@tehn I think we can make the jounal config slightly better and safer, shall I make a PR for that?

tehn commented 3 years ago

that would be fantastic

i guess best to put any changes here: https://github.com/monome/norns-image/tree/master/config

simonvanderveldt commented 3 years ago

Ok, will do. Will also think about a solution for this recursion problem

tehn commented 3 years ago

tbh i think the recursive problem can be solved by people not being forced to use os.execute within maiden to get logs. (hence the symlink suggestion)

ngwese commented 3 years ago

the way maiden loads files in full and caches them in the browser would prevent using maiden to browse for logs... updates to the file would not be viewable without refreshing the browser page. that is something which could be changed but it might be a deeper change.

catfact commented 3 years ago

tbh i think the recursive problem can be solved by people not being forced to use os.execute within maiden to get logs. (hence the symlink suggestion)

i don't think just linking to files is useful enough, i at least need tools like journalctl and dmesg to extract meaningful information from places like /run/log/journal/<long hex string>. and yeah i wouldn't ask maiden to include log browsing capabilities.


anyways yeah, i think going back to the duplicated output seems best. (as it is, there's literally no way to capture output from matron process from early in launch, it is lost on the socket.) we should try be conscious of limiting and removing debug prints (pointing at myself.) and maybe add some shell scripts to dump helpful error messages from journalctl / dmesg / whatever.

(sat down attempting to try some things like that, but am having some kind of PEBCAK problem still, see comment on PR #1267).

simonvanderveldt commented 3 years ago

@catfact is right, because the journal is made up of binary files there's no way to access it as a plain text file. I'm also not aware of any project that offers this functionality.

If we really want to streamline this maybe we should add a new socket activated service that gets all the logs and forwards them? Doesn't have to be complex, basically just get a full dump of the logs using something like journalctl -f -b --no-pager and make the output from that it available over a websocket.