Questions for generic understanding

CurlyMoo commented 9 years ago

After building my offsite ZFS backup with XBian and investigating the XBMC reboot loop, i have the following two questions:

Why do we let SSH wait on XBMC loading or failsafe-boot? I removed the ssh-nid.override and SSH just runs fine. I would just let telnet run besides SSH and try to start SSH as early as possible as well. When SSH fails, users can fallback on telnet.
Why do we check if XBMC is actually running have all kinds of failsafe scripts? Is there any special logic that depends on it? Can't we just monitor if the xbmc.bin process is still running and if not let it respawn (except when the user stopped XBMC himself).

Was just curious. Especially the XBMC thing questions me because it's already quite a hassle to get it working properly.

mk01 commented 9 years ago

historical thing only. when we migrated to upstart (with xbmc as the first service) most (all) services were still started via sysv runlevels. by default upstart runs rcS and rc2 in parallel to own jobs what (together with impossibility have rcX loading under precise control) caused significant stress to RPI hardware and very long XBMC load times 40-60s. so rc2 (what is considered currently as compatibility mode only) was set to be dependent on XBMC started event. only then all the other services took pressure to CPU and MMC. then the "important" services followed migration to upstart jobs (like ssh) but we kept the original approach to delay them (after XBMC). it remained that way until now. that means, it is pure setting thing, not a technical problem or dependency. (just btw openssh-nih is provided as template (not active by default) for users who needs to start sshd as daemon process. otherwise by default XBian is managing sshd via inetd wrapper (openbsd-inetd upstart job). let's change start on stanza from

start on started xbmc-loaded or failsafe-boot or started xbmc-done

to

start on started network-interface or started networking

if all fine, we can make that default

yes, that is theoretical assumption (that existing xbmc.bin means running XBMC). Two reasons - 1) all the known / not-known / bigger / smaller bugs with XBMC. 2) XBMC is from very first line of code threaded. Any thread started by process xbmc.bin is by default displayed AS xbmc.bin. And this tells nothing if XBMC is available to user (via correct output to screen and controllable by input devices). with all the issues like windowing thread not started if guisetting messed (almost always happening during version changes) and MANY other ways of being not fully loaded we need confirmation from WindowManager (in case of start) that SKIN successfully loaded home window. In case that would miss we will face those problems (and those are not just made right now - they are only few I remember from past 1.5y) - XBMC is running, no home window. Splash is blanked, screen black, no controls. Because console in GFX mode, terminal is locked (no way to change to another and access console). If by any chance NET is down, USER is stuck with no way of easy RPI access. another - xbmc process crashes, locks on /dev/fb will stay. No way for splash to exit, kernel io calls blocked (RPI is done with 1cpu) (in case kernel not preempt). by having correct xbmc-fail jobs xbmc and splash could can be killed, lock released and we can get RPI under control again. Then for instance XBMC exiting problems (many internal ways in xbmc code which could result in that) - XBMC will block reboot/shutdown process indefinitely. Or even if not actual start of reboot/shutdown process, dead process can still avoid proper FS unmount (remount to RO) - data loss. Or simple restart of XBMC will make machine unusable. Until XBMC allows threads to access kernel functions directly - this cant be mitigated by "fix" on XBMC side.

what I agree on - IF xbmc job is killed on TIMEOUT (startup timeout), respawn should not kick in. Respawn should be active only on 'random' xbmc crashes - actually this should be fixed. and indeed is in xbian-package-xbmc-scripts v1.1.5

CurlyMoo commented 9 years ago

I noticed we can just use the default ssh-nid.conf and remove the ssh-nid.override. So let's do that.
I'ts now a about theoretic assumption. I only mean: is there actually a use in knowing how well xbmc started services wise. Does some of our logic depend on it?

mk01 commented 9 years ago

your 2. would be right IF it would come out of valid premise. but it does not. for you it is theoretic assumption. for me it is hell & work for 4months. you can't remember that because it was the time you took a break from project. but I still remember how many times the workflow & it's upstart implementation changed around that. and each I remember why. I'm telling you it was not by reading never used disaster recovery manuals of some anonymous company.

for you it sounds so far far away... I tell you why - because we never got reported issues around that. we were never spoiled by mass reports of failed boots, shutdowns, over and over missing xbmc conf files, destroyed partitions, lost data. never ever with number of individual cases greater then std error.

why do you think it is like that. because we are lucky? because we have lucky users? or because the users do report 1ms rebutter but always forget to report lost data ? nope. it is because we took all possible precautions to minimise impact of those "theoretic assumptions".

so if you ask if there is actually use in knowing how well xbmc (the core application) is with start or with quit/reboot/shutdown - the answer is yes, there is. USER. the only which really matters. and the user is not interested in the details, yes - until all magically works. but he will tell you over and over again if the not important details suddenly do not play together.

CurlyMoo commented 9 years ago

I of course assume there is a reason to know it, but what. What do we do when XBMC is actually fully started? What services are dependent on it?

mk01 commented 9 years ago

we know we don't need to do anything.

the other case is when we need to take actions (already listed them before).

xbianonpi / xbian

Questions for generic understanding #654