kernelci / kernelci-project

KernelCI Linux Foundation project documentation
10 stars 27 forks source link

deferred-probe-empty test failures on Intel-based Chromebooks #218

Closed r-c-n closed 3 months ago

r-c-n commented 1 year ago

@gctucker @nuclearcat Not a kernel bug but something to fix in the KernelCI kernel builds instead.

There are a lot of bootrr deferred-probe-empty tests failing on almost all Intel-based Chromebooks since forever. Example: https://linux.kernelci.org/test/case/id/64810f3a4b727ea9fd30618b/

Depending on the board there might be different deferred drivers:

hatch and volteer:

i2c-10EC5682:00

sona and octopus:

MX98357A:00
platform MX98357A:00: deferred probe pending

zork and dalboz:

AMDI5682:00        cz-da7219-max98357a: devm_snd_soc_register_card(acpr5682m98357) failed

I haven't seen these on grunt or sarien.

This seems related to the kernel config. I could reproduce the problem easily with the kernel configuration from one of the failing tests: https://lava.collabora.dev/scheduler/job/10685806 (config = x86_64_defconfig+x86-chromebook)

and then I checked that it doesn't fail with a kernel configured with a custom config that enables most soundcard drivers and codecs: https://lava.collabora.dev/scheduler/job/11007480

Fixing this would clean up the test result logs and reports considerably.

crazoes commented 1 year ago

@hardboprobot I was testing this to make sure if it still fails as someone pointed this out on buganizer as well. I see that the tests are passing now. https://lava.collabora.dev/scheduler/job/11122689

r-c-n commented 1 year ago

@crazoes which kernel config did you use for that test run?

gctucker commented 1 year ago

Is it expected kernel behaviour to be pending on a deferred probe when the driver is not even registered?

There's always some implicit kernel config requirements to boot on any given type of hardware, but in this case it seems to be a grey area as one could argue the kernel should be able to deal with this more gracefully. Maybe the parent driver waiting for the deferred probe shouldn't have probed itself if it's missing a dependency.

@hardboprobot Could you please list the extra configs you had to enable to make the deferred probe issue go away?

r-c-n commented 1 year ago

@gctucker About the deferred probe semantics and the kernel handling this more gracefully, I agree. AFAIK the deferred probe mechanism doesn't really enforce anything and each driver is free to use it in the way it feels more appropriate, so gray areas are expected, I think.

From a testing point of view, a more elaborated solution for this could be done, in theory, by checking the kernel config and discarding a certain deferred probe failure if the key config option is missing, but that wouldn't be practical at all: it's not trivial to match the error messages to the config options and it'd be incredibly painful to maintain. So fixing the configs, even if it's overkill to add more options than necessary, could be the most reasonable workaround here.

The full config file was linked above: https://people.collabora.com/~rcn/kernel_config_snd_modules.config, most of the differences have to do with soundcard support:

 SND_AMD_ACP_CONFIG n -> m
 SND_ATMEL_SOC n -> m
 SND_BCM63XX_I2S_WHISTLER n -> m
 SND_SOC_ADI n -> m
 SND_SOC_ALC5623 n -> m
 SND_SOC_AMD_ACP3x n -> m
 SND_SOC_AMD_ACP5x n -> m
 SND_SOC_AMD_ACP6x n -> m
 SND_SOC_AMD_ACP_COMMON n -> m
 SND_SOC_AMD_CZ_RT5645_MACH n -> m
 SND_SOC_AMD_PS n -> m
 SND_SOC_AMD_RENOIR n -> m
 SND_SOC_AMD_RPL_ACP6x n -> m
 SND_SOC_AMD_ST_ES8336_MACH n -> m
 SND_SOC_CHV3_I2S n -> m
 SND_SOC_CS35L41_I2C n -> m
 SND_SOC_CS35L41_SPI n -> m
 SND_SOC_CS42L42 n -> m
 SND_SOC_CX2072X n -> m
 SND_SOC_DA7213 n -> m
 SND_SOC_DMIC n -> m
 SND_SOC_ES8316 n -> m
 SND_SOC_ES8326 n -> m
 SND_SOC_HDA n -> m
 SND_SOC_IMG n -> y
 SND_SOC_INTEL_APL n -> m
 SND_SOC_INTEL_AVS n -> m
 SND_SOC_INTEL_BYTCR_RT5640_MACH n -> m
 SND_SOC_INTEL_BYTCR_RT5651_MACH n -> m
 SND_SOC_INTEL_BYT_CHT_CX2072X_MACH n -> m
 SND_SOC_INTEL_BYT_CHT_DA7213_MACH n -> m
 SND_SOC_INTEL_BYT_CHT_ES8316_MACH n -> m
 SND_SOC_INTEL_BYT_CHT_NOCODEC_MACH n -> m
 SND_SOC_INTEL_CATPT n -> m
 SND_SOC_INTEL_CFL n -> m
 SND_SOC_INTEL_CHT_BSW_MAX98090_TI_MACH n -> m
 SND_SOC_INTEL_CHT_BSW_NAU8824_MACH n -> m
 SND_SOC_INTEL_CHT_BSW_RT5645_MACH n -> m
 SND_SOC_INTEL_CHT_BSW_RT5672_MACH n -> m
 SND_SOC_INTEL_CML_H n -> m
 SND_SOC_INTEL_CML_LP n -> m
 SND_SOC_INTEL_CNL n -> m
 SND_SOC_INTEL_GLK n -> m
 SND_SOC_INTEL_KBL n -> m
 SND_SOC_INTEL_SKL n -> m
 SND_SOC_INTEL_SKYLAKE n -> m
 SND_SOC_MAX9759 n -> m
 SND_SOC_MAX98090 n -> m
 SND_SOC_MAX98373_I2C n -> m
 SND_SOC_MAX98390 n -> m
 SND_SOC_MAX98927 n -> m
 SND_SOC_MT6351 n -> m
 SND_SOC_MT6358 n -> m
 SND_SOC_MT6660 n -> m
 SND_SOC_NAU8315 n -> m
 SND_SOC_NAU8821 n -> m
 SND_SOC_NAU8824 n -> m
 SND_SOC_PCM512x_I2C n -> m
 SND_SOC_RT5640 n -> m
 SND_SOC_SOF_TOPLEVEL n -> y
 SND_SOC_SSM4567 n -> m
 SND_SOC_TS3A227E n -> m
 SND_SOC_WM8804_I2C n -> m
 SND_SST_ATOM_HIFI2_PLATFORM_PCI n -> m
+CRYPTO_GENIV y
+CRYPTO_JITTERENTROPY_TESTINTERFACE n
+CRYPTO_SIG2 y
+DEV_COREDUMP y
+FW_CS_DSP m
+LEDS_AW200XX n
+MDIO_MSCC_MIIM n
+MFD_MAX77541 n
+MPRLS0025PA n
+OPT4001 n
+REGMAP_IRQ y
+REGMAP_MMIO m
+ROHM_BU27008 n
+SND_AMD_ASOC_REMBRANDT m
+SND_AMD_ASOC_RENOIR m
+SND_HDA_DSP_LOADER y
+SND_HDA_EXT_CORE m
+SND_INTEL_BYT_PREFER_SOF n
+SND_PCM_ELD y
+SND_SOC_ADI_AXI_I2S m
+SND_SOC_ADI_AXI_SPDIF m
+SND_SOC_AMD_ACP_I2S m
+SND_SOC_AMD_ACP_PCI m
+SND_SOC_AMD_ACP_PCM m
+SND_SOC_AMD_ACP_PDM m
+SND_SOC_AMD_LEGACY_MACH m
+SND_SOC_AMD_MACH_COMMON m
+SND_SOC_AMD_PS_MACH m
+SND_SOC_AMD_RENOIR_MACH m
+SND_SOC_AMD_RV_RT5682_MACH m
+SND_SOC_AMD_SOF_MACH m
+SND_SOC_AMD_VANGOGH_MACH m
+SND_SOC_AMD_YC_MACH m
+SND_SOC_CS35L41 m
+SND_SOC_CS35L41_LIB m
+SND_SOC_CS42L42_CORE m
+SND_SOC_HDAC_HDA m
+SND_SOC_HDAC_HDMI m
+SND_SOC_IMG_I2S_IN n
+SND_SOC_IMG_I2S_OUT n
+SND_SOC_IMG_PARALLEL_OUT n
+SND_SOC_IMG_PISTACHIO_INTERNAL_DAC n
+SND_SOC_IMG_SPDIF_IN n
+SND_SOC_IMG_SPDIF_OUT n
+SND_SOC_INTEL_AVS_MACH_DA7219 m
+SND_SOC_INTEL_AVS_MACH_DMIC m
+SND_SOC_INTEL_AVS_MACH_HDAUDIO m
+SND_SOC_INTEL_AVS_MACH_I2S_TEST m
+SND_SOC_INTEL_AVS_MACH_MAX98357A m
+SND_SOC_INTEL_AVS_MACH_MAX98373 m
+SND_SOC_INTEL_AVS_MACH_MAX98927 m
+SND_SOC_INTEL_AVS_MACH_NAU8825 m
+SND_SOC_INTEL_AVS_MACH_PROBE m
+SND_SOC_INTEL_AVS_MACH_RT274 m
+SND_SOC_INTEL_AVS_MACH_RT286 m
+SND_SOC_INTEL_AVS_MACH_RT298 m
+SND_SOC_INTEL_AVS_MACH_RT5682 m
+SND_SOC_INTEL_AVS_MACH_SSM4567 m
+SND_SOC_INTEL_BDW_RT5650_MACH m
+SND_SOC_INTEL_BDW_RT5677_MACH m
+SND_SOC_INTEL_BROADWELL_MACH m
+SND_SOC_INTEL_BXT_DA7219_MAX98357A_COMMON m
+SND_SOC_INTEL_BXT_DA7219_MAX98357A_MACH m
+SND_SOC_INTEL_BXT_RT298_MACH m
+SND_SOC_INTEL_CML_LP_DA7219_MAX98357A_MACH m
+SND_SOC_INTEL_DA7219_MAX98357A_GENERIC m
+SND_SOC_INTEL_EHL_RT5660_MACH m
+SND_SOC_INTEL_GLK_DA7219_MAX98357A_MACH m
+SND_SOC_INTEL_GLK_RT5682_MAX98357A_MACH m
+SND_SOC_INTEL_HASWELL_MACH m
+SND_SOC_INTEL_HDA_DSP_COMMON m
+SND_SOC_INTEL_KBL_DA7219_MAX98357A_MACH m
+SND_SOC_INTEL_KBL_DA7219_MAX98927_MACH m
+SND_SOC_INTEL_KBL_RT5660_MACH m
+SND_SOC_INTEL_KBL_RT5663_MAX98927_MACH m
+SND_SOC_INTEL_KBL_RT5663_RT5514_MAX98927_MACH m
+SND_SOC_INTEL_SKL_HDA_DSP_GENERIC_MACH m
+SND_SOC_INTEL_SKL_NAU88L25_MAX98357A_MACH m
+SND_SOC_INTEL_SKL_NAU88L25_SSM4567_MACH m
+SND_SOC_INTEL_SKL_RT286_MACH m
+SND_SOC_INTEL_SKYLAKE_COMMON m
+SND_SOC_INTEL_SKYLAKE_FAMILY m
+SND_SOC_INTEL_SKYLAKE_HDAUDIO_CODEC y
+SND_SOC_INTEL_SKYLAKE_SSP_CLK m
+SND_SOC_INTEL_SOF_CIRRUS_COMMON m
+SND_SOC_INTEL_SOF_CML_RT1011_RT5682_MACH m
+SND_SOC_INTEL_SOF_CS42L42_MACH m
+SND_SOC_INTEL_SOF_DA7219_MAX98373_MACH m
+SND_SOC_INTEL_SOF_ES8336_MACH m
+SND_SOC_INTEL_SOF_MAXIM_COMMON m
+SND_SOC_INTEL_SOF_NAU8825_MACH m
+SND_SOC_INTEL_SOF_PCM512x_MACH m
+SND_SOC_INTEL_SOF_REALTEK_COMMON m
+SND_SOC_INTEL_SOF_RT5682_MACH m
+SND_SOC_INTEL_SOF_SSP_AMP_MACH m
+SND_SOC_INTEL_SOF_WM8804_MACH m
+SND_SOC_INTEL_SST m
+SND_SOC_MAX98373 m
+SND_SOC_NAU8825 m
+SND_SOC_PCM512x m
+SND_SOC_RL6347A m
+SND_SOC_RT1011 m
+SND_SOC_RT1015 m
+SND_SOC_RT1015P m
+SND_SOC_RT1019 m
+SND_SOC_RT1308 m
+SND_SOC_RT274 m
+SND_SOC_RT286 m
+SND_SOC_RT298 m
+SND_SOC_RT5514 m
+SND_SOC_RT5514_SPI m
+SND_SOC_RT5645 m
+SND_SOC_RT5651 m
+SND_SOC_RT5660 m
+SND_SOC_RT5663 m
+SND_SOC_RT5670 m
+SND_SOC_RT5677 m
+SND_SOC_RT5677_SPI m
+SND_SOC_RT5682S m
+SND_SOC_SOF m
+SND_SOC_SOF_ACPI m
+SND_SOC_SOF_ACPI_DEV m
+SND_SOC_SOF_ALDERLAKE m
+SND_SOC_SOF_AMD_COMMON m
+SND_SOC_SOF_AMD_REMBRANDT m
+SND_SOC_SOF_AMD_RENOIR m
+SND_SOC_SOF_AMD_TOPLEVEL m
+SND_SOC_SOF_APOLLOLAKE m
+SND_SOC_SOF_BAYTRAIL m
+SND_SOC_SOF_BROADWELL m
+SND_SOC_SOF_CANNONLAKE m
+SND_SOC_SOF_CLIENT m
+SND_SOC_SOF_COFFEELAKE m
+SND_SOC_SOF_COMETLAKE m
+SND_SOC_SOF_DEBUG_PROBES m
+SND_SOC_SOF_ELKHARTLAKE m
+SND_SOC_SOF_GEMINILAKE m
+SND_SOC_SOF_HDA m
+SND_SOC_SOF_HDA_AUDIO_CODEC y
+SND_SOC_SOF_HDA_COMMON m
+SND_SOC_SOF_HDA_LINK y
+SND_SOC_SOF_HDA_LINK_BASELINE m
+SND_SOC_SOF_HDA_MLINK m
+SND_SOC_SOF_HDA_PROBES m
+SND_SOC_SOF_ICELAKE m
+SND_SOC_SOF_INTEL_APL m
+SND_SOC_SOF_INTEL_ATOM_HIFI_EP m
+SND_SOC_SOF_INTEL_CNL m
+SND_SOC_SOF_INTEL_COMMON m
+SND_SOC_SOF_INTEL_HIFI_EP_IPC m
+SND_SOC_SOF_INTEL_ICL m
+SND_SOC_SOF_INTEL_IPC4 y
+SND_SOC_SOF_INTEL_MTL m
+SND_SOC_SOF_INTEL_SKL m
+SND_SOC_SOF_INTEL_SOUNDWIRE_LINK_BASELINE m
+SND_SOC_SOF_INTEL_TGL m
+SND_SOC_SOF_INTEL_TOPLEVEL y
+SND_SOC_SOF_IPC3 y
+SND_SOC_SOF_JASPERLAKE m
+SND_SOC_SOF_KABYLAKE m
+SND_SOC_SOF_MERRIFIELD m
+SND_SOC_SOF_METEORLAKE m
+SND_SOC_SOF_PCI m
+SND_SOC_SOF_PCI_DEV m
+SND_SOC_SOF_PROBE_WORK_QUEUE y
+SND_SOC_SOF_SKYLAKE m
+SND_SOC_SOF_TIGERLAKE m
+SND_SOC_SOF_XTENSA m
+SND_SOC_TOPOLOGY y
+SND_SOC_WM8804 m
+SND_SOC_WM_ADSP m
gctucker commented 1 year ago

OK thanks @hardboprobot, so we can treat this as "suboptimal design" in the kernel and compensate by adding config options in KernelCI to suppress the deferred probe warnings.

Maybe a longer-term improvement in the kernel side of things would be to solve dependency between drivers and probed devices as I think that's always been a weakness, but I don't think it's realistic. Or maybe some small parts could be improved, like only for ACPI or only for device tree platforms we could catch things earlier if we can tell some devices will never probe at runtime or add more checks at build time. Well that's just me brain storming here :p

crazoes commented 1 year ago

@hardboprobot here is the config file for your reference. Also, I tested this on regressions happening in chrome-platform/for-kernelci. I just followed the usual way through kci_build to generate the x86_64_defconfig+x86-chromebook config.

Here are the lava jobs for it. https://lava.collabora.dev/scheduler/job/11124258 https://lava.collabora.dev/scheduler/job/11124257 https://lava.collabora.dev/scheduler/job/11122842

r-c-n commented 1 year ago

@crazoes What changed between the builds from your tests and this one for instance? The kernel versions are supposed to be the same: chrome-platform/for-kernelci v6.5-rc1-1-ga6edd5f5d9cc and the config is supposed to be the same as well: x86_64_defconfig+x86-chromebook

The last test runs for this using KernelCI builds still fail.

crazoes commented 1 year ago

@hardboprobot that's the weird thing about it which I cannot figure out why it started to work now. I checked the x86-chromebook config options and there have been no recent additions to it related to sound. There were only some config options added by Laura which were related to Video Codec.

Also, @nfraprado added the deferred probe timeout = 60s but I think was already being used by these devices and I could see it in the logs for it. I am waiting for new kernelci build so we can confirm if it still fails or not.

crazoes commented 1 year ago

@hardboprobot can you verify this on your side as well to confirm if I am not doing anything wrong here while building the kernel?

crazoes commented 1 year ago

@gctucker can we add the config options for now in KernelCI to resolve the warnings for now? If yes, then will this go as fragments in build-configs.yaml file?

gctucker commented 1 year ago

@gctucker can we add the config options for now in KernelCI to resolve the warnings for now? If yes, then will this go as fragments in build-configs.yaml file?

One issue with adding lots of config options is that it makes the build further away from the base defconfig. So the kernel may start showing different results than the same tests run on other platforms without the fragment, which makes comparisons a bit harder to make. Other than that, if this is to be added to the x86-chromebook fragment then sure that should be fine. Also it's probably worth looking into having a fragment file rather than an inline list of configs in YAML as this is starting to get rather large.

crazoes commented 1 year ago

I understand the problem and that is why I was trying to narrow down the config options but it is very complicated to find the exact config options needed by these devices, especially when there are multiple devices failing due to missing codec configs.

I was only able to identify the config options for zork

SND_SOC_AMD_RV_RT5682_MACH
SND_SOC_INTEL_AVS_MACH_RT5682
SND_SOC_INTEL_GLK_RT5682_MAX98357A_MACH
SND_SOC_INTEL_SOF_CML_RT1011_RT5682_MACH
SND_SOC_INTEL_SOF_RT5682_MACH
SND_SOC_RT5682S
SND_SOC_CS35L41_I2C
SND_SOC_MAX98373_I2C
SND_SOC_PCM512x_I2C
SND_SOC_WM8804_I2C

But they don't work for other hp and asus devices.

padovan commented 3 months ago

Old issue. (also we are not tracking kernel test failure/issues through GitHub anymore)