micro-nova / AmpliPi

Whole House Audio System 🔊
https://amplipi.com
GNU General Public License v3.0
285 stars 23 forks source link

Internet radio streams dying after a few days - logs reporting "read-only file system" #352

Open sumnerboy12 opened 2 years ago

sumnerboy12 commented 2 years ago

I have seen this twice in the last month. All internet radio streams stop playing (they are left running 24/7).

I have a Sonos Connect amplifier connected to input 4 and that is still playing fine.

Web UI is still functioning fine, and it shows each radio stream as playing. Stopping and starting the stream thru the UI has no effect, nor does change the input source - i.e. moving radio station A from input 1 to input 2.

Only thing that resolves the issue is a hard reset.

Found this in the logs;

Jul 18 04:17:24 amplipi authbind[745]: stealing Radio Hauraki from source Sonos
Jul 18 04:17:24 amplipi authbind[745]: Radio Hauraki disconnected
Jul 18 04:17:24 amplipi authbind[745]: connecting Radio Hauraki to 0...
Jul 18 04:17:24 amplipi authbind[745]: running: ['/home/pi/amplipi-dev/venv/bin/python', '/home/pi/amplipi-dev/streams/runvlc.py', 'https://ais-nzme.streamguys1.com/nz_009_aac', 'ch0', '--song-info', '/home/pi/amplipi-dev/config/srcs/0/c
Jul 18 04:17:24 amplipi authbind[745]: Radio Hauraki (stream: https://ais-nzme.streamguys1.com/nz_009_aac) connected to 0 via ch0
Jul 18 04:17:24 amplipi authbind[745]: INFO:     192.168.10.27:64152 - "PATCH /api/sources/0 HTTP/1.1" 200 OK
Jul 18 04:17:24 amplipi authbind[745]: Traceback (most recent call last):
Jul 18 04:17:24 amplipi authbind[745]:   File "/home/pi/amplipi-dev/streams/runvlc.py", line 60, in <module>
Jul 18 04:17:24 amplipi authbind[745]:     log_file = open(args.log, 'a', encoding='utf-8')
Jul 18 04:17:24 amplipi authbind[745]: OSError: [Errno 30] Read-only file system: '/home/pi/amplipi-dev/config/srcs/0/log'
Lohrer commented 2 years ago

It's possible with the internet radio thrashing that happened on your system, the eMMC on the Pi is failing. Could you run a quick test on the Pi to help determine if that's the issue?

Install the mmc-utils package

sudo apt install mmc-utils

Grab the EXT_CSD_DEVICE_LIFE_TIME_ESTTYP* and EXT_CSD_PRE_EOL_INFO.

sudo mmc extcsd read /dev/mmcblk0 | grep 'LIFE_TIME\|EOL'

The LIFE_TIME_EST estimates the amount of device life time (writes) used so far, in increments of 10%. 0x01 means 0-10% used, 0x02 means 10-20%, etc. PRE_EOL gives the amount of reserved blocks used. 0x01: <80%. 0x02: 80%-90%. 0x03: >90%.

sumnerboy12 commented 2 years ago

I am unable to install anything as the file system is read-only (see the logs above). Should it be? I haven't restarted it yet, as I wanted to check if there was anything to retrieve beforehand.

linknum23 commented 2 years ago

I think at this point you will have to restart to run that command.

On Mon, Jul 18, 2022, 17:59 Ben Jones @.***> wrote:

I am unable to install anything as the file system is read-only (see the logs above). Should it be? I haven't restarted it yet, as I wanted to check if there was anything to retrieve beforehand.

— Reply to this email directly, view it on GitHub https://github.com/micro-nova/AmpliPi/issues/352#issuecomment-1188363934, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZPOYPSJSXEHB4VMY5YFDVUXHV3ANCNFSM533MAKGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

sumnerboy12 commented 2 years ago

Any idea what has put my file system in read-only mode?

...rebooting now

sumnerboy12 commented 2 years ago
pi@amplipi:~ $ sudo mmc extcsd read /dev/mmcblk0 | grep 'LIFE_TIME\|EOL'
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x01
eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x01
eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01
sumnerboy12 commented 2 years ago

One thing to note, I am querying the /api/zones endpoint every 5s as part of my home automation integration. I also have the Home Assistant component running, which appears to be querying /api 2-3 times every 10s.

I have just upgraded all packages on my device (sudo apt dist-upgrade) and have configured syslog to push out to my logging server (graylog) so I can more easily check what is happening.

Lohrer commented 2 years ago

Thanks for checking the eMMC wear. The system should not be read-only, but the eMMC is reporting that it's ok so now I'm not sure what would be causing the filesystem to fallback to read-only mode. Querying the API around once every other second shouldn't be an issue, we regularly do more than that here.

Just as a sanity check for debugging purposes, have you installed anything else on the AmpliPi's Raspberry Pi? Is the filesystem getting full (df -h / /boot)?

You could check if fsck found any issues with journalctl --no-pager | grep systemd-fsck. That will take a while on the Pi, so if you want to just check a single boot use -b for the latest boot or add a number from journalctl --list-boots after the -b to show results from an older boot. Unfortunately even if we see fsck is fixing errors, it won't really point to what's causing those errors, but at least we can narrow down the problem.

One other thought is that the userland packages were still ~2 years old until you ran apt dist-upgrade (full-upgrade is the official shiny new command name FYI but they are aliased for muscle memory reasons). AmpliPi v0.1.8's updater now upgrades the kernel but It's possible there was some older packages that also were causing problems.

Finally, if the issue keeps happening, we can try the nuclear option of re-imaging the Pi with the latest image we ship on new AmpliPis. If it gets that far I'll have to get you a link to download the large couple gigabyte image file.

sumnerboy12 commented 2 years ago

Have run my ansible bootstrap role, which installs an ansible and ben user, and adds SSH keys, but that is about it.

You can see the partitions are not full;

ben@amplipi:~ $ df -h / /boot
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        29G  4.7G   23G  18% /
/dev/mmcblk0p1  253M   49M  204M  20% /boot

I ran the fsck check but got no results;

ben@amplipi:~ $ sudo journalctl --list-boots
 0 6586b3ad5dcb4064aca62d17e055d061 Tue 2022-07-19 23:30:23 NZST—Wed 2022-07-20 10:01:30 NZST
ben@amplipi:~ $ sudo journalctl --no-pager | grep systemd-fsck
ben@amplipi:~ $

I will keep an eye on things and report back if/when it happens again.

sumnerboy12 commented 2 years ago

Ok - things appear to be getting worse - only two days between failures this time. And this time the web UI is returning HTTP500, and the app is not working. But the REST API is still running ok, I can still control via my home automation system. It is even still playing radio streams.

I have managed to capture the logs on my remote logging system (Graylog). You can see something is up with mmc0 but I will leave with you to have a look;

amplipi-read-only.csv

sumnerboy12 commented 2 years ago

I think I have a hardware problem. Rebooting the device is no longer solving the problem and it is flicking over to read-only straight away.

linknum23 commented 2 years ago

Our support has contacted is contacting you over email about a replacement module. We will continue to diagnose the module once we get it in our hands.

EDIT: We will contact you today

sumnerboy12 commented 2 years ago

New module received and system back up and running. Will close this for now as it seems like it was a h/w fault.

sumnerboy12 commented 2 years ago

It has happened again only a few days after installing the new CM3 module..

amplipi-read-only-#2.csv

Do these logs show any indication of what is going wrong?

sumnerboy12 commented 2 years ago

Have rebooted and everything is back to normal...for now!

linknum23 commented 2 years ago

Hmm that doesn't make any sense. We are hoping to get more info about this when we get the first module back.

In the meantime are you still using a similar config as before? If it changed significantly please email me your latest one.

sumnerboy12 commented 2 years ago

It is identical - I reloaded a config backup from before and have been running that since installing the new module.

sumnerboy12 commented 2 years ago

I am still seeing my radio station streams locking up after a few days.

I am also seeing these alsa audio output error: cannot open ALSA device ""ch0"" errors in the log...

amplipi-cannot-open-alsa-device-ch0.csv

sumnerboy12 commented 2 years ago

I have just updated my automation flows (in NodeRED) to automatically stop/start streams on-demand. So if there are no zones using a particular source/stream it will automatically stop that stream. I am hoping this will mean no more lock ups as I won't have any more long running internet radio streams.

sumnerboy12 commented 1 year ago

FYI - this just happened again, for the first time in about 6 months. SSH and web UI unresponsive and my log monitoring system has reported the file system has gone into read-only mode.

sumnerboy12 commented 1 year ago

Interesting observation, the REST API is still responding, but only to GET requests. I have tried POST to api/reboot and api/announce and they both return 500.

linknum23 commented 1 year ago

Odd. When the filesystem goes read-only (which we still don't fully understand) the background save process will eventually fail. Was there a message was being reported with the 500 error?

On Tue, May 30, 2023, 19:30 Ben Jones @.***> wrote:

Interesting observation, the REST API is still responding, but only to GET requests. I have tried POST to api/reboot and api/announce and they both return 500.

— Reply to this email directly, view it on GitHub https://github.com/micro-nova/AmpliPi/issues/352#issuecomment-1569272263, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZPO3YFTZ5UCAGEGKGEPLXIZ7JXANCNFSM533MAKGA . You are receiving this because you commented.Message ID: @.***>

sumnerboy12 commented 1 year ago

None that I could see - I was attempting the POST requests via Postman.

On 31/05/2023 2:47 pm, Lincoln Lorenz wrote:

Odd. When the filesystem goes read-only (which we still don't fully understand) the background save process will eventually fail. Was there a message was being reported with the 500 error?

On Tue, May 30, 2023, 19:30 Ben Jones @.***> wrote:

Interesting observation, the REST API is still responding, but only to GET requests. I have tried POST to api/reboot and api/announce and they both return 500.

— Reply to this email directly, view it on GitHub

https://github.com/micro-nova/AmpliPi/issues/352#issuecomment-1569272263, or unsubscribe

https://github.com/notifications/unsubscribe-auth/AAEZPO3YFTZ5UCAGEGKGEPLXIZ7JXANCNFSM533MAKGA . You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/micro-nova/AmpliPi/issues/352#issuecomment-1569418116, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGTR2UL3JPXROCLY43FGUTXI2WMTANCNFSM533MAKGA. You are receiving this because you modified the open/close state.Message ID: @.***>

sumnerboy12 commented 1 year ago

Hmm, happened again today.

linknum23 commented 1 year ago

Hmm, anything change recently?

sumnerboy12 commented 1 year ago

I have moved the device into our new home, where the internet connection is a little spotty until I get it sorted. That is the only difference I can think of. It has been stable for the last week however.