Open sumnerboy12 opened 2 years ago
It's possible with the internet radio thrashing that happened on your system, the eMMC on the Pi is failing. Could you run a quick test on the Pi to help determine if that's the issue?
Install the mmc-utils
package
sudo apt install mmc-utils
Grab the EXT_CSD_DEVICE_LIFE_TIME_ESTTYP* and EXT_CSD_PRE_EOL_INFO.
sudo mmc extcsd read /dev/mmcblk0 | grep 'LIFE_TIME\|EOL'
The LIFE_TIME_EST estimates the amount of device life time (writes) used so far, in increments of 10%. 0x01 means 0-10% used, 0x02 means 10-20%, etc. PRE_EOL gives the amount of reserved blocks used. 0x01: <80%. 0x02: 80%-90%. 0x03: >90%.
I am unable to install anything as the file system is read-only (see the logs above). Should it be? I haven't restarted it yet, as I wanted to check if there was anything to retrieve beforehand.
I think at this point you will have to restart to run that command.
On Mon, Jul 18, 2022, 17:59 Ben Jones @.***> wrote:
I am unable to install anything as the file system is read-only (see the logs above). Should it be? I haven't restarted it yet, as I wanted to check if there was anything to retrieve beforehand.
— Reply to this email directly, view it on GitHub https://github.com/micro-nova/AmpliPi/issues/352#issuecomment-1188363934, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZPOYPSJSXEHB4VMY5YFDVUXHV3ANCNFSM533MAKGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Any idea what has put my file system in read-only mode?
...rebooting now
pi@amplipi:~ $ sudo mmc extcsd read /dev/mmcblk0 | grep 'LIFE_TIME\|EOL'
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x01
eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x01
eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01
One thing to note, I am querying the /api/zones
endpoint every 5s as part of my home automation integration. I also have the Home Assistant component running, which appears to be querying /api
2-3 times every 10s.
I have just upgraded all packages on my device (sudo apt dist-upgrade
) and have configured syslog to push out to my logging server (graylog) so I can more easily check what is happening.
Thanks for checking the eMMC wear. The system should not be read-only, but the eMMC is reporting that it's ok so now I'm not sure what would be causing the filesystem to fallback to read-only mode. Querying the API around once every other second shouldn't be an issue, we regularly do more than that here.
Just as a sanity check for debugging purposes, have you installed anything else on the AmpliPi's Raspberry Pi? Is the filesystem getting full (df -h / /boot
)?
You could check if fsck
found any issues with journalctl --no-pager | grep systemd-fsck
. That will take a while on the Pi, so if you want to just check a single boot use -b
for the latest boot or add a number from journalctl --list-boots
after the -b
to show results from an older boot. Unfortunately even if we see fsck
is fixing errors, it won't really point to what's causing those errors, but at least we can narrow down the problem.
One other thought is that the userland packages were still ~2 years old until you ran apt dist-upgrade
(full-upgrade
is the official shiny new command name FYI but they are aliased for muscle memory reasons). AmpliPi v0.1.8's updater now upgrades the kernel but It's possible there was some older packages that also were causing problems.
Finally, if the issue keeps happening, we can try the nuclear option of re-imaging the Pi with the latest image we ship on new AmpliPis. If it gets that far I'll have to get you a link to download the large couple gigabyte image file.
Have run my ansible bootstrap role, which installs an ansible and ben user, and adds SSH keys, but that is about it.
You can see the partitions are not full;
ben@amplipi:~ $ df -h / /boot
Filesystem Size Used Avail Use% Mounted on
/dev/root 29G 4.7G 23G 18% /
/dev/mmcblk0p1 253M 49M 204M 20% /boot
I ran the fsck
check but got no results;
ben@amplipi:~ $ sudo journalctl --list-boots
0 6586b3ad5dcb4064aca62d17e055d061 Tue 2022-07-19 23:30:23 NZST—Wed 2022-07-20 10:01:30 NZST
ben@amplipi:~ $ sudo journalctl --no-pager | grep systemd-fsck
ben@amplipi:~ $
I will keep an eye on things and report back if/when it happens again.
Ok - things appear to be getting worse - only two days between failures this time. And this time the web UI is returning HTTP500, and the app is not working. But the REST API is still running ok, I can still control via my home automation system. It is even still playing radio streams.
I have managed to capture the logs on my remote logging system (Graylog). You can see something is up with mmc0 but I will leave with you to have a look;
I think I have a hardware problem. Rebooting the device is no longer solving the problem and it is flicking over to read-only straight away.
Our support has contacted is contacting you over email about a replacement module. We will continue to diagnose the module once we get it in our hands.
EDIT: We will contact you today
New module received and system back up and running. Will close this for now as it seems like it was a h/w fault.
It has happened again only a few days after installing the new CM3 module..
Do these logs show any indication of what is going wrong?
Have rebooted and everything is back to normal...for now!
Hmm that doesn't make any sense. We are hoping to get more info about this when we get the first module back.
In the meantime are you still using a similar config as before? If it changed significantly please email me your latest one.
It is identical - I reloaded a config backup from before and have been running that since installing the new module.
I am still seeing my radio station streams locking up after a few days.
I am also seeing these alsa audio output error: cannot open ALSA device ""ch0""
errors in the log...
I have just updated my automation flows (in NodeRED) to automatically stop/start streams on-demand. So if there are no zones using a particular source/stream it will automatically stop
that stream. I am hoping this will mean no more lock ups as I won't have any more long running internet radio streams.
FYI - this just happened again, for the first time in about 6 months. SSH and web UI unresponsive and my log monitoring system has reported the file system has gone into read-only mode.
Interesting observation, the REST API is still responding, but only to GET requests. I have tried POST to api/reboot
and api/announce
and they both return 500.
Odd. When the filesystem goes read-only (which we still don't fully understand) the background save process will eventually fail. Was there a message was being reported with the 500 error?
On Tue, May 30, 2023, 19:30 Ben Jones @.***> wrote:
Interesting observation, the REST API is still responding, but only to GET requests. I have tried POST to api/reboot and api/announce and they both return 500.
— Reply to this email directly, view it on GitHub https://github.com/micro-nova/AmpliPi/issues/352#issuecomment-1569272263, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZPO3YFTZ5UCAGEGKGEPLXIZ7JXANCNFSM533MAKGA . You are receiving this because you commented.Message ID: @.***>
None that I could see - I was attempting the POST requests via Postman.
On 31/05/2023 2:47 pm, Lincoln Lorenz wrote:
Odd. When the filesystem goes read-only (which we still don't fully understand) the background save process will eventually fail. Was there a message was being reported with the 500 error?
On Tue, May 30, 2023, 19:30 Ben Jones @.***> wrote:
Interesting observation, the REST API is still responding, but only to GET requests. I have tried POST to api/reboot and api/announce and they both return 500.
— Reply to this email directly, view it on GitHub
https://github.com/micro-nova/AmpliPi/issues/352#issuecomment-1569272263, or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAEZPO3YFTZ5UCAGEGKGEPLXIZ7JXANCNFSM533MAKGA . You are receiving this because you commented.Message ID: @.***>
— Reply to this email directly, view it on GitHub https://github.com/micro-nova/AmpliPi/issues/352#issuecomment-1569418116, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGTR2UL3JPXROCLY43FGUTXI2WMTANCNFSM533MAKGA. You are receiving this because you modified the open/close state.Message ID: @.***>
Hmm, happened again today.
Hmm, anything change recently?
I have moved the device into our new home, where the internet connection is a little spotty until I get it sorted. That is the only difference I can think of. It has been stable for the last week however.
I have seen this twice in the last month. All internet radio streams stop playing (they are left running 24/7).
I have a Sonos Connect amplifier connected to input 4 and that is still playing fine.
Web UI is still functioning fine, and it shows each radio stream as playing. Stopping and starting the stream thru the UI has no effect, nor does change the input source - i.e. moving radio station A from input 1 to input 2.
Only thing that resolves the issue is a hard reset.
Found this in the logs;