monome / norns

norns is many sound instruments.
http://monome.org
GNU General Public License v3.0
629 stars 145 forks source link

audio engine fail after update, sometimes #799

Closed catfact closed 4 years ago

catfact commented 5 years ago

quoting @tehn:

note: i’ve sometimes seen a fresh startup with “audio engine fail” after update. either reboot or RESET through the menu and everything will be happy. trying to locate the reason for the first-up error (that then goes away)

catfact commented 5 years ago

i have a vague suspicion that this is somehow related to the way we add SC include paths (with norns_config.sc.) if you wipe out the .yaml or remove and recreate the config class (??) then sclang it has to go through a cycle before the .yaml is created again.

that is:

catfact commented 5 years ago

if thus is indeed what's happening, we could a) run sclang once as part of the update script, b) retire norns_config.sc and go back to symlinks

antonhornquist commented 5 years ago

how about removing norns_config.sc and adding the generated ´yaml` file in the installation process?

simonvanderveldt commented 5 years ago

I guess alternatively we could just have a fixed sclang_conf.yaml and not do the changes to the includePaths during init?

catfact commented 5 years ago

that seems dangerous, since the .yaml can be blasted by sclang any time

okyeron commented 5 years ago

My experience over the last day or two - reboot I get the screen showing error: audio engine - sometimes it will play the startup warble, sometimes not (which is very odd to me).

A start/stop or RESET and I'm usually back in business.

FWIW - I had killed the .yaml file the other day, but all the paths seem valid right now

antonhornquist commented 5 years ago

@catfact only if LanguageConfig.store is run, no? and then, i was under the impression the yaml file actually is actually read to the LanguageConfig on boot, so storing would, uh, restore things.

catfact commented 5 years ago

i have not dug deep into whats up w/ the .yaml lifecycle, but yea - paths should be valid after running sclang a couple times, no matter what you do.

is there any actual objection to symlinks?

(i know i made an objection to them at some long-ago time, but i was being an idiot - my problem is that when i use emacs to edit Foo.sc in the path, i sometimes end up with #Foo.sc#, which, stupidly, sclang attempts to compile and throws duplicate class. that's why my original SC install script actually copied .sc files explicitly. but both symlinking and changing the path have the same "problem.")

antonhornquist commented 5 years ago

i don't like symlinking every sc file, but symlinking to the directories i'm totally fine with

catfact commented 5 years ago

@antonhornquist i'm sure you're right. my understanding is pretty vague.

antonhornquist commented 5 years ago

(it was a bit of a hassle to run ./install.sh after adding new scripts, is all)

okyeron commented 5 years ago

Further data point - this error happens for me on every boot so far today. So I don't think its related to updating.

EDIT - Not sure what logs would be helpful to look at here. Any pointers would be appreciated.

catfact commented 5 years ago

symlinking every sc file

@antonhornquist yea we don't need to do that at all. sorry about that.

symlinking to the directories i'm totally fine with

ok

@okyeron ok that's.. very weird. that's not happening to me or (AFAIK) anyone in the dev group.

antonhornquist commented 5 years ago

🤔

@okyeron what's the output of sclang in those occurences?

i have had zero errors like these recently

okyeron commented 5 years ago

How can I get the sclang output on reboot?

(I am on RasPi so maybe it's something not showing on norns hardware)

antonhornquist commented 5 years ago

@okyeron i believe journalctl -u norns-sclang might help

okyeron commented 5 years ago

nada

-- Logs begin at Thu 2019-04-04 15:04:51 CDT, end at Thu 2019-04-04 15:09:44 CDT. --
Apr 04 15:04:59 norns systemd[1]: Started norns-sclang.service.
simonvanderveldt commented 5 years ago

@catfact only if LanguageConfig.store is run, no? and then, i was under the impression the yaml file actually is actually read to the LanguageConfig on boot, so storing would, uh, restore things.

Just checked the sources and seems to indeed only happen when calling LanguageConfig.store. This seems to be the only write function for the config and it only gets used by LanguageConfig.store https://github.com/supercollider/supercollider/blob/63783aeca007c1e1d2a7cf73b3f3da056913f077/lang/LangPrimSource/PyrPrimitive.cpp#L3626

antonhornquist commented 5 years ago

@okyeron 🤔 weird, i can view regular sclang output using that command. my experience with journalctl is limited, though, so i'm not sure what may happen

simonvanderveldt commented 5 years ago

@okyeron If you get this reliably on every boot you could disable/mask norns-sclang using systemctl mask norns-sclang, then reboot and then start sclang manually.

I'm afraid it's output is getting eaten by the ws-wrapper/interactive terminal interaction

[edit] Sorry masking won't work, you'll have to move the unit

cd ~
sudo mv /etc/systemd/system/norns-sclang.service .

And then restart.

Once done move it back using

cd ~
sudo mv norns-sclang.service /etc/systemd/system/
okyeron commented 5 years ago
10.0.1.19 ~/norns $ sudo systemctl mask norns-sclang
Failed to mask unit: File /etc/systemd/system/norns-sclang.service already exists.

Now I'm getting NONE on reboot, but when I try to load a script - no audio and lots of these:

FAILURE IN SERVER /s_new Group 1012 not found
FAILURE IN SERVER /s_new Group 1012 not found
FAILURE IN SERVER /s_new Group 1012 not found
FAILURE IN SERVER /s_new Group 1012 not found
FAILURE IN SERVER /s_new Group 1012 not found
FAILURE IN SERVER /s_new Group 1012 not found
FAILURE IN SERVER /s_new Group 1012 not found
FAILURE IN SERVER /s_new Group 1012 not found

I'm going to get my second device updated to today's and see if I can reproduce

okyeron commented 5 years ago

FWIW - Pi-norns number 2 is not reproducing the error.

Not sure what's happening. So I probably need to do more research/digging. Apologies if that was a false alarm

antonhornquist commented 5 years ago

btw the FAILURE IN SERVER /s_new Group xyz not found are not severe in any way, i get these too

okyeron commented 5 years ago

One more weird anecdote - i just tried removing all my 3rd party scripts - leaving only current versions for awake, mlr, ash and we (pulled from git within the last hour) - and reboot went just fine (a couple times in a row)

Going to try re-installing these from github and see if any one causes my error (fwiw - none of those include an engine, so... ¯_(ツ)_/¯ ).

Here's the scripts i had installed: Screen Shot 2019-04-04 at 3 35 32 PM.

simonvanderveldt commented 5 years ago

@okyeron Sorry, my initial instructions were incorrect, updated the comment https://github.com/monome/norns/issues/799#issuecomment-480046252

catfact commented 5 years ago

(fwiw - none of those include an engine

awake does - PolyPerc lives there now :man_shrugging:

okyeron commented 5 years ago

Followup... I found a difference between my setups:

in norns-jack.service

problematic PI has this: ExecStartPost=/usr/bin/jack_wait -w -t 10

non-problematic Pi has older version of the jack service with: ExecStartPost=/bin/sleep 2

okyeron commented 5 years ago

This may be entirely a RasPi problem, but I thought I should share this possible solution:

older pi still /etc/systemd/system/norns-init.service installed and being required by norns.target

This had beed edited for RasPi and the only thing it was doing is this: ExecStart=/bin/sh -c "/bin/echo -n performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor"

The newer install did not have norns-init.service.

I added this back in and my boot problems have gone away.

simonvanderveldt commented 5 years ago

Hmm, strange. That would seems to indicate a problem with the order in which the components are being started. Still in that case sclang should fail and you should see that when looking at its logs or status. If you want you can make a comparison before/after the change using systemd-analyze. See for info how to do so https://wiki.archlinux.org/index.php/Improving_performance/Boot_process#Analyzing_the_boot_process

Regarding the governor: If you look at cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor before/after the change you can check if it actually makes a difference.

catfact commented 5 years ago

can we close this?

simonvanderveldt commented 5 years ago

@catfact I'm not sure about the initial problem you reported. Is that still relevant?

I'd like to know the actual cause for the issue @okyeron is having but that could also be a different issue.

catfact commented 5 years ago

that quote from @tehn was the entirety of my information. @okyeron report seemed related. my theory was about the classpaths / yaml / &c because tehn's report was in the beta while we were still messing with this - i don't even know which update scripts were referred to.

by all means, let's keep it open if mysterious jack failures on boot are still a thing.

simonvanderveldt commented 5 years ago

@okyeron Are you still able to reproduce this issue? If so and you also have a working device as well could you execute the steps mentioned in this comment https://github.com/monome/norns/issues/799#issuecomment-480179816 ?

okyeron commented 5 years ago

@simonvanderveldt I've not had it happen again since adding that norns-init file back into my startup. I can try reversing that process and see what happens.

I'm wondering if this is something that's has been set on most people's devices over time, but happens with a completely fresh build?

So - I took away norns-init.service and change scaling_governor to "powersave" which I believe is the default with the stock stretch install.

Reboot and I get the error: Audio Engine. If I then SYSTEM>RESET, things startup just fine.

Here's some systemctl status dumps after boot (with the error on screen)

10.0.1.19 ~ $ systemctl status norns-jack.service
● norns-jack.service
   Loaded: loaded (/etc/systemd/system/norns-jack.service; disabled; vendor preset: enabled)
   Active: active (running) since Sun 2019-04-07 17:36:06 CDT; 6min ago
  Process: 353 ExecStartPost=/usr/bin/jack_wait -w -t 10 (code=exited, status=0/SUCCESS)
 Main PID: 352 (jackd)
   CGroup: /system.slice/norns-jack.service
           └─352 /usr/bin/jackd -R -P 95 -d alsa -d hw:0 -r 48000 -n 3 -p 128 -S -s

Apr 07 17:36:04 norns jack_wait[353]: jack server is not running or cannot be started
Apr 07 17:36:04 norns jack_wait[353]: JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for 
Apr 07 17:36:05 norns jackd[352]: creating alsa driver ... hw:0|hw:0|128|3|48000|0|0|nomon|swmeter|s
Apr 07 17:36:05 norns jackd[352]: configuring for 48000Hz, period = 128 frames (2.7 ms), buffer = 3 
Apr 07 17:36:05 norns jackd[352]: ALSA: final selected sample format for capture: 16bit little-endia
Apr 07 17:36:05 norns jackd[352]: ALSA: use 3 periods for capture
Apr 07 17:36:05 norns jackd[352]: ALSA: final selected sample format for playback: 16bit little-endi
Apr 07 17:36:05 norns jackd[352]: ALSA: use 3 periods for playback
Apr 07 17:36:06 norns jack_wait[353]: server is available
Apr 07 17:36:06 norns systemd[1]: Started norns-jack.service.
10.0.1.19 ~ $ systemctl status norns-matron.service
● norns-matron.service
   Loaded: loaded (/etc/systemd/system/norns-matron.service; disabled; vendor preset: enabled)
   Active: active (running) since Sun 2019-04-07 17:36:03 CDT; 7min ago
 Main PID: 350 (ws-wrapper)
   CGroup: /system.slice/norns-matron.service
           ├─350 /home/we/norns/build/ws-wrapper/ws-wrapper ws://*:5555 /home/we/norns/build/matron/
           └─363 /home/we/norns/build/matron/matron -f/dev/fb1

Apr 07 17:36:03 norns systemd[1]: Started norns-matron.service.
Apr 07 17:36:07 norns matron[363]: *** WARNING *** The program 'matron' uses the Apple Bonjour compa
Apr 07 17:36:07 norns matron[363]: *** WARNING *** Please fix your application to use the native API
Apr 07 17:36:07 norns matron[363]: *** WARNING *** For more information see <http://0pointer.de/avah
10.0.1.19 ~ $ systemctl status norns-crone.service
● norns-crone.service
   Loaded: loaded (/etc/systemd/system/norns-crone.service; disabled; vendor preset: enabled)
   Active: active (running) since Sun 2019-04-07 17:36:06 CDT; 7min ago
 Main PID: 457 (crone)
   CGroup: /system.slice/norns-crone.service
           └─457 /home/we/norns/build/crone/crone

Apr 07 17:36:26 norns crone[457]: softcut: setting voice output level 3: 0
Apr 07 17:36:26 norns crone[457]: softcut: setting enabled: voice 3: 0
Apr 07 17:36:26 norns crone[457]: softcut: setting voice output level 4: 0
Apr 07 17:36:26 norns crone[457]: osc rx: /set/enabled/cut
Apr 07 17:36:26 norns crone[457]: osc rx: /set/level/cut
Apr 07 17:36:26 norns crone[457]: osc rx: /set/enabled/cut
Apr 07 17:36:26 norns crone[457]: osc rx: /softcut/buffer/clear
Apr 07 17:36:26 norns crone[457]: softcut: setting enabled: voice 4: 0
Apr 07 17:36:26 norns crone[457]: softcut: setting voice output level 5: 0
Apr 07 17:36:26 norns crone[457]: softcut: setting enabled: voice 5: 0
okyeron commented 5 years ago

FWIW - adding the norns-init.service back (with scaling_governor set to performance) and things work Ok again - But... the jack startup looks a little bit different.

Here's a comparison of the two

First this is journalctl -u norns-jack for the Error: Audio Engine boot

-- Logs begin at Thu 2016-11-03 12:16:43 CDT, end at Sun 2019-04-07 18:20:00 CDT. --
Apr 07 18:19:20 norns systemd[1]: Starting norns-jack.service...
Apr 07 18:19:20 norns jack_wait[359]: Cannot connect to server socket err = No such file or directory
Apr 07 18:19:20 norns jack_wait[359]: Cannot connect to server request channel
Apr 07 18:19:20 norns jack_wait[359]: jack server is not running or cannot be started
Apr 07 18:19:20 norns jack_wait[359]: JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
Apr 07 18:19:21 norns jackd[358]: jackdmp 1.9.12
Apr 07 18:19:21 norns jackd[358]: Copyright 2001-2005 Paul Davis and others.
Apr 07 18:19:21 norns jackd[358]: Copyright 2004-2016 Grame.
Apr 07 18:19:21 norns jackd[358]: Copyright 2016-2017 Filipe Coelho.
Apr 07 18:19:21 norns jackd[358]: jackdmp comes with ABSOLUTELY NO WARRANTY
Apr 07 18:19:21 norns jackd[358]: This is free software, and you are welcome to redistribute it
Apr 07 18:19:21 norns jackd[358]: under certain conditions; see the file COPYING for details
Apr 07 18:19:21 norns jackd[358]: JACK server starting in realtime mode with priority 95
Apr 07 18:19:21 norns jackd[358]: self-connect-mode is "Don't restrict self connect requests"
Apr 07 18:19:21 norns jack_wait[359]: Cannot connect to server socket err = No such file or directory
Apr 07 18:19:21 norns jack_wait[359]: Cannot connect to server request channel
Apr 07 18:19:21 norns jack_wait[359]: jack server is not running or cannot be started
Apr 07 18:19:21 norns jack_wait[359]: JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
Apr 07 18:19:22 norns jackd[358]: creating alsa driver ... hw:0|hw:0|128|3|48000|0|0|nomon|swmeter|soft-mode|16bit
Apr 07 18:19:22 norns jackd[358]: configuring for 48000Hz, period = 128 frames (2.7 ms), buffer = 3 periods
Apr 07 18:19:22 norns jackd[358]: ALSA: final selected sample format for capture: 16bit little-endian
Apr 07 18:19:22 norns jackd[358]: ALSA: use 3 periods for capture
Apr 07 18:19:22 norns jackd[358]: ALSA: final selected sample format for playback: 16bit little-endian
Apr 07 18:19:22 norns jackd[358]: ALSA: use 3 periods for playback
Apr 07 18:19:23 norns systemd[1]: Started norns-jack.service.

Second is journalctl -u norns-jack for a normal working boot:

10.0.1.19 ~ $ journalctl -u norns-jack
-- Logs begin at Sun 2019-04-07 18:21:47 CDT, end at Sun 2019-04-07 18:22:36 CDT. --
Apr 07 18:21:53 norns systemd[1]: Starting norns-jack.service...
Apr 07 18:21:53 norns jack_wait[349]: Cannot connect to server socket err = No such file or directory
Apr 07 18:21:53 norns jack_wait[349]: Cannot connect to server request channel
Apr 07 18:21:53 norns jack_wait[349]: jack server is not running or cannot be started
Apr 07 18:21:53 norns jack_wait[349]: JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
Apr 07 18:21:54 norns jackd[348]: jackdmp 1.9.12
Apr 07 18:21:54 norns jackd[348]: Copyright 2001-2005 Paul Davis and others.
Apr 07 18:21:54 norns jackd[348]: Copyright 2004-2016 Grame.
Apr 07 18:21:54 norns jackd[348]: Copyright 2016-2017 Filipe Coelho.
Apr 07 18:21:54 norns jackd[348]: jackdmp comes with ABSOLUTELY NO WARRANTY
Apr 07 18:21:54 norns jackd[348]: This is free software, and you are welcome to redistribute it
Apr 07 18:21:54 norns jackd[348]: under certain conditions; see the file COPYING for details
Apr 07 18:21:54 norns jackd[348]: JACK server starting in realtime mode with priority 95
Apr 07 18:21:54 norns jackd[348]: self-connect-mode is "Don't restrict self connect requests"
Apr 07 18:21:54 norns jack_wait[349]: Cannot connect to server socket err = No such file or directory
Apr 07 18:21:54 norns jack_wait[349]: Cannot connect to server request channel
Apr 07 18:21:54 norns jack_wait[349]: jack server is not running or cannot be started
Apr 07 18:21:54 norns jack_wait[349]: JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
Apr 07 18:21:54 norns jackd[348]: creating alsa driver ... hw:0|hw:0|128|3|48000|0|0|nomon|swmeter|soft-mode|16bit
Apr 07 18:21:54 norns jackd[348]: configuring for 48000Hz, period = 128 frames (2.7 ms), buffer = 3 periods
Apr 07 18:21:54 norns jackd[348]: ALSA: final selected sample format for capture: 16bit little-endian
Apr 07 18:21:54 norns jackd[348]: ALSA: use 3 periods for capture
Apr 07 18:21:54 norns jackd[348]: ALSA: final selected sample format for playback: 16bit little-endian
Apr 07 18:21:54 norns jackd[348]: ALSA: use 3 periods for playback
Apr 07 18:21:55 norns jack_wait[349]: server is available
Apr 07 18:21:55 norns systemd[1]: Started norns-jack.service.
Apr 07 18:22:05 norns jackd[348]: JackEngine::XRun: client = SuperCollider was not finished, state = Running
Apr 07 18:22:05 norns jackd[348]: JackAudioDriver::ProcessGraphAsyncMaster: Process error
Apr 07 18:22:05 norns jackd[348]: JackEngine::XRun: client = SuperCollider was not finished, state = Triggered
Apr 07 18:22:05 norns jackd[348]: JackAudioDriver::ProcessGraphAsyncMaster: Process error
simonvanderveldt commented 5 years ago

@okyeron regarding the governor: This is different for the norns kernel vs the default raspberry kernel, you probably want to make sure you're using the performance governor.

Regarding the issue with JACK, I don't really understand it tbh. It doesn't really make sense that it's caused by the difference in CPU performance because of a different governor since jack_wait will wait for 10 seconds which should be more than enough for any level of performance. Initially it seems to suggest that either jack_wait doesn't actually block the unit, which from local testing doesn't seem to be the case or that jack_wait is somehow broken. Otherwise I have no explanation for this at the moment.

Any chance you could produce the charts I linked above?

okyeron commented 5 years ago

@simonvanderveldt Since I'm building a kernel for the Raspi install, I'd like to get it as close to the norns kernel as possible. I did some comparisons of the norns_defconfig to the defconfig I'm using and I think it's the same in terms of the performance governor settings, but that is still pretty confusing to me.

I tried making the systemd-analyze chart but it really didn't make any sense to me. I can try this again, but won't really have time to do this until Tuesday/Wednesday.

FWIW - I don't think jack_wait is involved here - when I looked at the systemd-analyze and jack was finishing pretty quickly.

While trying to get a a few other people installed we're seeing some issues with the systemd units causing an audio engine error at boot - I think because crone is not properly getting a local network?*

*we need to skip using network manager for wifi until the very end of the install process - and with network/interfaces set to use wpa_supplicant we get that error. Still working on that process to hopefully make it better/smoother.

simonvanderveldt commented 5 years ago

@simonvanderveldt Since I'm building a kernel for the Raspi install, I'd like to get it as close to the norns kernel as possible. I did some comparisons of the norns_defconfig to the defconfig I'm using and I think it's the same in terms of the performance governor settings, but that is still pretty confusing to me.

We don't really do anything specific with the kernel at the moment apart from removing stuff we're never going to need and add the drivers for norns hardware. It's your time but I wouldn't waste my time on building a kernel specifically for the RPi for this ;)

I tried making the systemd-analyze chart but it really didn't make any sense to me. I can try this again, but won't really have time to do this until Tuesday/Wednesday.

Sorry, I should've probably been slightly more specific. systemd-analyze plot > bootchart.svg should get you an svg that shows a graphical representation of the boot process showing both duration an dependencies.

While trying to get a a few other people installed we're seeing some issues with the systemd units causing an audio engine error at boot - I think because crone is not properly getting a local network?*

*we need to skip using network manager for wifi until the very end of the install process - and with network/interfaces set to use wpa_supplicant we get that error. Still working on that process to hopefully make it better/smoother.

Tbh I don't know if that would matter. Even without NetworkManager or wpa_supplicant running you should be able to just run the norns components and they should just work. Unless I'm misunderstanding you.

okyeron commented 5 years ago

We don't really do anything specific with the kernel at the moment apart from removing stuff we're never going to need and add the drivers for norns hardware. It's your time but I wouldn't waste my time on building a kernel specifically for the RPi for this ;)

Depending on what stock kernel we start with on Raspi, we need to compile in the ssd1322 display driver module. There's also a setting in the config to default to the performance governor. (I've also been testing with the realtime kernel that PatchboxOS is using)

FWIW - I need to re-test with the full release version of the norns kernel. In previous betas it would not boot on a regular pi3b+. Perhaps the tweaks for the CM3+ have fixed that for pi3b+ as well.

Tbh I don't know if that would matter. Even without NetworkManager or wpa_supplicant running you should be able to just run the norns components and they should just work. Unless I'm misunderstanding you.

I think there might be something going on where localhost is not yet available to crone and that might be causing the audio engine error.

as an example - if you added auto lo to /etc/network/interfaces this will cause network manager to ignore lo and the norns will boot up with the audio engine error

okyeron commented 5 years ago

@simonvanderveldt @catfact Hey ya know what... I think I know what the cause is for the original issue here...

In building a install script for the raspi, we need to run sclang once before compiling norns and everything - I believe is to get the initial SC class libraries and directories all set up.

So if a user has a brand new install, it's likely they will have this error.

The trick we came up with in our install script was to run echo | sclang before compiling norns. I think if this was added to the SC install.sh or the update.sh script it would eliminate the error for anyone building a brand new install.

catfact commented 4 years ago

can we close this? i'm saying yes, provisionally. it seems like it would only matter if an update changes include paths for sclang. this is not a common occurence. if we need to do that, then there are a couple ways for the update script to perform a workaround.

even if we did nothing, it's not much of a problem since a soft RESET will also fix it.

if it needs more attention, maybe we can open a more focused issue explaining why, with a clean slate.