This code causes my device to crash without error.

synaption commented 7 months ago

#!/usr/bin/env python
#python record21.py -d dmic_sv /home/pi/record/test.raw

from datetime import datetime
from datetime import timezone
from datetime import timedelta
LOCAL_TIMEZONE = datetime.now(timezone.utc).astimezone().tzinfo
import os

import sys
import time
import getopt
import alsaaudio
import random
import subprocess
import shutil
import traceback
import glob

def usage():
    print('usage: recordtest.py [-d <device>] <file>', file=sys.stderr)
    sys.exit(2)

def read_data_from_device(inp, f):
    # Read data from device
    l, data = inp.read()
    if l:
        f.write(data)
    time.sleep(.001)

if __name__ == '__main__':
    os.chdir("/dev/shm")

    device = 'default'

    opts, args = getopt.getopt(sys.argv[1:], 'd:')
    for o, a in opts:
        if o == '-d':
            device = a

    if not args:
        usage()

    f = open(args[0], 'wb')
    inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NONBLOCK, 
        channels=2, rate=32000, format=alsaaudio.PCM_FORMAT_S32_LE, 
        periodsize=640, device=device)

    now = datetime.now()
    next_time = (now + timedelta(seconds=1)).strftime("%H_%M_%S")
    current_time = now.strftime("%H_%M_%S")
    detect_time=current_time 

    while True:
        try:
            now = datetime.now()
            if now.strftime("%H_%M_%S")>next_time:
                now = datetime.now()
                current_time = now.strftime("%H_%M_%S")
                next_time = (now + timedelta(seconds=1)).strftime("%H_%M_%S")
                current_second = now.strftime("%S")[1]
                #print("Current Time =", current_time)
                if os.path.getsize(args[0]) > 3000:
                    #print(os.path.getsize(args[0]))
                    shutil.move("test.raw", "test1.raw")
                    f = open(args[0], 'wb')
                    detect_time=current_time
            read_data_from_device(inp, f)

        except Exception as e:  # Catch the exception and print the traceback
            traceback.print_exc()

This is a slightly modified version of the recordtest.py example. Instead of recording for a time and then stopping if records continuously and every second it chunks data into a new file to be processed, though crashes occur weather there is further processing or not.

Crashes can occur after any time but most frequently they happen within 2 to 4 hours. They seem to occur regardless of what else is or is not running. This is unconfirmed, but running cpu and memory stress tests seemed to make crashes less likely.

I start the code with this command: sudo nice -n -19 sudo -u pi python /home/pi/record/record.py -d dmic_sv /dev/shm/test.raw

I've tried different nice values or no nice command at all. I have tried different period sizes. I have tried PCM.NORMAL, and PCM.NONBLOCKING. I have tried different delays. I have tried restarting the program every 30 minutes. The entire devices crashes and reboots. There are no errors ANYWHERE. There is no traceback in python, there is nothing in dbus, kmesg, or journalctl. But it only happens when I run this program.

I have been able to run this code, with a different command and prevent crashing using null as the device instead of dmic_sv: sudo nice -n -19 sudo -u pi python /home/pi/record/record.py -d null /dev/shm/test.raw

This might point to an issue with the driver, or the microphone ittself but if I use a different method of chunking data out, it also prevents crashing: sudo nice -n -19 sudo -u pi arecord -D dmic_sv -c2 -r 32000 -f S32_LE -t raw -v -F 1000 \ | while :; do dd bs=128000 count=1 iflag=fullblock 2>/dev/null >>/dev/shm/test.raw; done

The problem with this method is that it is not capable of anywhere near the latency I need, and I cannot get accurate timing because I do not have access to the size of the data in the buffer like I can with l, data = inp.read() and I do not know the exact time data was removed from the buffer.

I have had this issue for years. It occurred with previous and current versions of pyalsaaudio. It has taken me this long to narrow the problem down this far. Currently the information I have leads me to believe there is some issue with the interaction between the I2s microphone and pyalsaaudio because I do not have issues with arecord and I do not have issues with a dummy microphone. If I could just force a crash to happen, that would be extremely helpful for debugging.

RonaldAJ commented 7 months ago

Have you tried closing the files?

synaption commented 7 months ago

@RonaldAJ I have tried moving and opening the files a lot more to try to force a crash but it had no effect. I am about to try closing the files. The overarching issue is that I can only iterate so quickly because I wont know if it worked until it ran for ~24hrs+ without crashing. I suspect if that was the issue it would still be an issue with alsa's null device.

larsimmisch commented 7 months ago

My first thought is: if it's rebooting, you're probably running into a hard kernel limit (or even misbehaving driver).

Then, what @RonaldAJ says: you are moving an open file on /dev/shm - I'd definitely close it

ossilator commented 7 months ago

The entire devices crashes and reboots.

you have a kernel (driver) problem. period.

I do not have access to the size of the data in the buffer

there is an avail() function in the yet unreleased main branch. happy workarounding! :stuck_out_tongue_winking_eye:

synaption commented 7 months ago

I can confirmf.close() accomplishes nothing. I am now running an even more minimal version of the code.

import alsaaudio

device = 'dmic_sv'

inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NONBLOCK, 
    channels=2, rate=32000, format=alsaaudio.PCM_FORMAT_S32_LE, 
    periodsize=640, device=device)

while True:
    # Read data from device
    inp.read()

synaption commented 7 months ago

I can confirm the minimal code crashes and possibly crashes faster. 2 crashes in under 2 hours with 3 devices. I assume nobody here want's to help tackle an issue issue with the jenky driver I ported from raspberry pi to imx, but help forcing a crash or figuring out a work around would be appreciated.

RonaldAJ commented 7 months ago

I know the feeling. I once had bugs showing up only after days of running code.

RonaldAJ commented 7 months ago

safe_read in arecord has separate handling when no samples are returned. But the read function definition there is hard to find back, so it is difficult to judge how this is different from what pyalsaaudio does.

To speed up crashes you could try increasing the sampling rate and lowering period size. That way you operate closer to hardware limits, and you get more interaction moments with the driver.

Other than that I am out of ideas.

ossilator commented 7 months ago

I assume nobody here want[]s to help tackle an issue issue with the jenky driver I ported from raspberry pi to imx, but help forcing a crash or figuring out a work around would be appreciated.

with that information, i find your approach to this problem baffling.

it's rather obvious that this is a driver problem (user space cannot crash the whole system), and it's your driver. why didn't you start by looking there, and why are you considering workarounds in user space?

if you need help with the driver, post it to the alsa development list.

let's close this here, as it's obviously out of scope.

larsimmisch / pyalsaaudio

This code causes my device to crash without error. #147