thesofproject / linux

Linux kernel source tree
Other
89 stars 128 forks source link

ADL-N DSP boot failure #5104

Open brentlu opened 1 month ago

brentlu commented 1 month ago

We received a broken ALD-N Chromebook device which audio DSP always fails to boot. The failure rate is 100% and so far we couldn't find another device with same symptoms.

Linux kernel log suggests there is error returns from SOF_IPC_DAI_CONFIG(0x80010000) command: linux.txt

[ 6.995494] sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx: 0x30100000: GLB_TPLG_MSG: PIPE_NEW [ 6.996259] sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx: 0x30010000: GLB_TPLG_MSG: COMP_NEW [ 6.996622] sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx: 0x80010000: GLB_DAI_MSG: CONFIG [ 6.996971] sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx error for 0x80010000 (msg/reply size: 216/12): -19 [ 6.996977] sof-audio-pci-intel-tgl 0000:00:1f.3: Failed to set dai config for SSP1-Codec

FW log shows that there is problem in ipc_comp_dai_config() function: fw-org.txt

[ 672435.546196] ( 521.822876) c0 ipc src/ipc/ipc3/handler.c:1605 INFO ipc: new cmd 0x80010000 [ 672461.900362] ( 26.354166) c0 ipc src/ipc/ipc3/handler.c:595 INFO ipc: dai 1.1 -> config [ 672481.900361] ( 20.000000) c0 dai src/lib/dai.c:166 INFO dai_get type 1 index 1 new sref 2 [ 672502.681610] ( 20.781250) c0 ssp-dai 1.1 /drivers/intel/ssp/ssp.c:302 INFO ssp_set_config(), config->format = 0x4001 [ 672529.764943] ( 27.083332) c0 ssp-dai 1.1 /drivers/intel/ssp/ssp.c:729 INFO ssp_set_config(), sscr0 = 0xc1d00037, sscr1 = 0xd0400000, ssto = 0x00000000, sspsp = 0xa200000 [ 672548.671192] ( 18.906250) c0 ssp-dai 1.1 /drivers/intel/ssp/ssp.c:731 INFO ssp_set_config(), sscr2 = 0x00004002, sspsp2 = 0x00000000, sscr3 = 0x07070000, ssioc = 0x00000020 [ 672567.525358] ( 18.854166) c0 ssp-dai 1.1 /drivers/intel/ssp/ssp.c:733 INFO ssp_set_config(), ssrsa = 0x00000003, sstsa = 0x00000003 [ 672589.817024] ( 22.291666) c0 mn ./drivers/intel/ssp/mn.c:259 INFO mclk_rate 24576000, mclk_source_clock 0 [ 672609.035773] ( 19.218750) c0 mn ./drivers/intel/ssp/mn.c:220 INFO mclk_id 0 mdivr_val 1 [ 672628.775355] ( 19.739582) c0 ssp-dai 1.1 /drivers/intel/ssp/ssp.c:747 INFO ssp_set_config(): enable MCLK for SSP1 [ 672649.296188] ( 20.520832) c0 dai src/lib/dai.c:190 INFO dai_put type 1 index 1 new sref 1 [ 672669.087854] ( 19.791666) c0 ipc src/ipc/ipc3/dai.c:185 INFO ipc_comp_dai_config() dai type = 1 index = 1 [ 672697.212853] ( 28.124998) c1 idc ......./intel/cavs/idc.c:58 INFO idc_irq_handler(), IPC_IDCTFC_BUSY [ 672721.275352] ( 24.062498) c1 idc ......./intel/cavs/idc.c:186 INFO idc_do_cmd() [ 672748.098267] ( 26.822916) c1 ipc src/ipc/ipc3/handler.c:595 INFO ipc: dai 1.1 -> config [ 672766.639933] ( 18.541666) c1 ipc src/ipc/ipc3/dai.c:185 INFO ipc_comp_dai_config() dai type = 1 index = 1 [ 672791.327432] ( 24.687498) c1 ipc src/ipc/ipc3/dai.c:210 ERROR ipc_comp_dai_config(): comp_dai_config() failed

brentlu commented 1 month ago

Memory corruption is observed in dai_new() function.

[ 799520.280730] ( 27.031248) c1 dai src/audio/dai.c:162 ERROR dai_new() [ 799536.478646] ( 16.197916) c1 dai src/audio/dai.c:165 INFO config: core 1, id 0x40, pid 0x1d, type 2 => the comp_ipc_config structure which will be used to initialize comp_dev structure [ 799563.301562] ( 26.822916) c1 dai src/audio/dai.c:172 INFO comp_dev(0x9e0fe080) [ 799581.686977] ( 18.385416) c1 dai src/audio/dai.c:175 INFO comp_ipc_config: core 1, id 0x0, pid 0x0, type 2 => same structure dumped from comp_dev structure, id and pid fields are corrupted

static struct comp_dev *dai_new(const struct comp_driver *drv,
                struct comp_ipc_config *config,
                void *spec)
{
    struct comp_dev *dev;
    struct ipc_config_dai *dai = spec;
    struct dai_data *dd;
    uint32_t dir, caps, dma_dev;

    comp_cl_err(&comp_dai, "dai_new()");
    comp_cl_info(&comp_dai, "  config: core %d, id 0x%x, pid 0x%x, type %d",
        config->core, config->id,
        config->pipeline_id, config->type);

    dev = comp_alloc(drv, sizeof(*dev));
    if (!dev)
        return NULL;
    dev->ipc_config = *config;

    comp_cl_info(&comp_dai, "  comp_dev(0x%x)", dev);
    comp_cl_info(&comp_dai, "  comp_ipc_config: core %d, id 0x%x, pid 0x%x, type %d",
        dev->ipc_config.core, dev->ipc_config.id,
        dev->ipc_config.pipeline_id, dev->ipc_config.type);
brentlu commented 1 month ago

Add more debut output to confirm it's memory corruption: sof-0715.txt

[ 465615.033581] ( 26.822916) c1 dai src/audio/dai.c:162 ERROR dai_new() [ 465631.335664] ( 16.302082) c1 dai src/audio/dai.c:165 INFO config: core 1, id 0x40, pid 0x1d, type 2 [ 465658.002330] ( 26.666666) c1 dai src/audio/dai.c:172 INFO comp_dev(0x9e0fe080) [ 465676.596079] ( 18.593750) c1 dai src/audio/dai.c:175 INFO comp_ipc_config: core 1, id 0x0, pid 0x0, type 2 [ 465694.825245] ( 18.229166) c1 dai src/audio/dai.c:178 INFO origin: core 1, id 0x40, pid 0x1d, type 2

static struct comp_dev *dai_new(const struct comp_driver *drv,
                struct comp_ipc_config *config,
                void *spec)
{
    struct comp_dev *dev;
    struct ipc_config_dai *dai = spec;
    struct dai_data *dd;
    uint32_t dir, caps, dma_dev;

    comp_cl_err(&comp_dai, "dai_new()");
    comp_cl_info(&comp_dai, "  config: core %d, id 0x%x, pid 0x%x, type %d",
        config->core, config->id,
        config->pipeline_id, config->type);

    dev = comp_alloc(drv, sizeof(*dev));
    if (!dev)
        return NULL;
    dev->ipc_config = *config;

    comp_cl_info(&comp_dai, "  comp_dev(0x%x)", dev);
    comp_cl_info(&comp_dai, "  comp_ipc_config: core %d, id 0x%x, pid 0x%x, type %d",
        dev->ipc_config.core, dev->ipc_config.id,
        dev->ipc_config.pipeline_id, dev->ipc_config.type);
    comp_cl_info(&comp_dai, "  origin: core %d, id 0x%x, pid 0x%x, type %d",
        config->core, config->id,
        config->pipeline_id, config->type);
plbossart commented 1 month ago

Can you double-check the topology settings @brentlu ? Wondering why what the changes are for this platform?

brentlu commented 1 month ago

There is nothing special for this topology. The only difference is that MCLK rate is 24.576MHz but this setting has been long used in max98373 topology.

"sof-tgl-max98357a-rt5682\;sof-adl-rt5650\;-DUSE_RT5650\;-DFMT=s24le\;-DPLATFORM=adl\;-DLINUX_MACHINE_DRIVER=sof_rt5682\;-DAMP_SSP=1\;-DSPK_MIC_PERIOD_US=10000\;-DBT_OFFLOAD\;-DNOHOTWORD\;-DNO16KDMIC"
brentlu commented 1 month ago

DSP panic could be observed with same FW image. The exception cause LoadStorePIFDataErrorCause also suggests that there is problem with the memory. So my question would be "is there anything we can do / need to do about it?". If not, I would suggest ODM to go through CQE process for further analysis.

[ 6.986281] sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump start ]------------ [ 6.986285] sof-audio-pci-intel-tgl 0000:00:1f.3: DSP panic! [ 6.986287] sof-audio-pci-intel-tgl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (6) [ 6.986298] sof-audio-pci-intel-tgl 0000:00:1f.3: 0x00000005: module: ROM, state: FW_ENTERED, running [ 6.986305] sof-audio-pci-intel-tgl 0000:00:1f.3: status code: 0xdead006 (unknown) [ 6.986395] sof-audio-pci-intel-tgl 0000:00:1f.3: reason: runtime exception (0x6) [ 6.986397] sof-audio-pci-intel-tgl 0000:00:1f.3: trace point: 0x00004000 [ 6.986399] sof-audio-pci-intel-tgl 0000:00:1f.3: panic at :0 [ 6.986401] sof-audio-pci-intel-tgl 0000:00:1f.3: error: DSP Firmware Oops [ 6.986403] sof-audio-pci-intel-tgl 0000:00:1f.3: error: Exception Cause: LoadStorePIFDataErrorCause, Synchronous PIF data error during LoadStore access [ 6.986405] sof-audio-pci-intel-tgl 0000:00:1f.3: EXCCAUSE 0x0000000d EXCVADDR 0x0000001c PS 0x00060025 SAR 0x00000000 [ 6.986408] sof-audio-pci-intel-tgl 0000:00:1f.3: EPC1 0xbe031c5d EPC2 0xbe051b22 EPC3 0x00000000 EPC4 0x00000000 [ 6.986410] sof-audio-pci-intel-tgl 0000:00:1f.3: EPC5 0x00000000 EPC6 0x00000000 EPC7 0x00000000 DEPC 0x00000000 [ 6.986412] sof-audio-pci-intel-tgl 0000:00:1f.3: EPS2 0x00060b20 EPS3 0x00000000 EPS4 0x00000000 EPS5 0x00000000 [ 6.986414] sof-audio-pci-intel-tgl 0000:00:1f.3: EPS6 0x00000000 EPS7 0x00000000 INTENABL 0x00000000 INTERRU 0x00000222 [ 6.986417] sof-audio-pci-intel-tgl 0000:00:1f.3: stack dump from 0xbe271aa0 [ 6.986419] sof-audio-pci-intel-tgl 0000:00:1f.3: 0xbe271aa0: 00000000 00000000 00000003 be05a308 [ 6.986422] sof-audio-pci-intel-tgl 0000:00:1f.3: 0xbe271ab0: fe0605a8 be271af0 be271b20 0000000d [ 6.986424] sof-audio-pci-intel-tgl 0000:00:1f.3: 0xbe271ac0: be271aa0 00000000 00000000 00000000 [ 6.986426] sof-audio-pci-intel-tgl 0000:00:1f.3: 0xbe271ad0: 00000000 00000000 00000000 00000000 [ 6.986428] sof-audio-pci-intel-tgl 0000:00:1f.3: 0xbe271ae0: 7e031c5d be271b20 9e1bed60 be05a494 [ 6.986430] sof-audio-pci-intel-tgl 0000:00:1f.3: 0xbe271af0: be031c5d 00000000 00000000 00000000 [ 6.986432] sof-audio-pci-intel-tgl 0000:00:1f.3: 0xbe271b00: be0c4468 00000003 ffffffff ffffffff [ 6.986434] sof-audio-pci-intel-tgl 0000:00:1f.3: 0xbe271b10: be02f47b be271d20 ffffffff fe031c5d [ 6.986435] sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump end ]------------

plbossart commented 1 month ago

My guess is that you have a conflict between SSPs, specifically I am not sure if you can have BT_OFFLOAD along with the MCLK at 24.576?

Can you try and force the nocodec mode to see if you are able to use the SSPs in default mode? that would point to a problem in the topology or SSP setup.

#define SOF_DBG_FORCE_NOCODEC           BIT(10) /* ignore all codec-related  configurations */
brentlu commented 1 month ago

Memory corruption could be observed even on nocodec mode. I print the variables of same structure and found the value would change. It seems to me that there is something wrong with the memory.

[ 5437.812284] ( 31.770832) c2 ipc src/ipc/ipc3/helper.c:329 INFO comp_dev(0x9e0fe080): [ 5455.260200] ( 17.447916) c2 ipc src/ipc/ipc3/helper.c:332 INFO comp_ipc_config: core 6, id 0x0, pid 0x0, type 1 => data printed in comp_new() function [ 5486.249782] ( 30.989582) c2 ipc src/ipc/ipc3/helper.c:688 ERROR ipc_comp_new(): add ipc_comp_dev(0x9e0de640) to comp_list [ 5502.812281] ( 16.562500) c2 ipc src/ipc/ipc3/helper.c:690 INFO comp_dev(0x9e0fe080): [ 5521.249781] ( 18.437500) c2 ipc src/ipc/ipc3/helper.c:693 INFO comp_ipc_config: core 0, id 0x0, pid 0x0, type 1 => same data printed in the caller, ipc_comp_new() function, the structure member 'core' is corrupted