zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.86k stars 6.62k forks source link

process: require sample configurations to be complete #28091

Open pabigot opened 4 years ago

pabigot commented 4 years ago

tl;dr: by policy Zephyr sample configurations should be self-contained and not rely on board-specific default selects. Failure to do this should be considered a bug in the sample.

As we move towards a more rigorous development process I propose the following, with context:

I keep running into problems where Zephyr samples don't clearly identify the Kconfig options required to make them work.

An example is samples/bluetooth/beacon. The prj.conf enables only CONFIG_BT=y and works on Nordic development boards. When used on an out-of-tree board it can build successfully, but will fail at runtime with:

Bluetooth init failed (err -19)
[00:00:00.006,591] <err> bt_hci_core: No HCI driver registered

This happens because the in-tree Nordic development boards all happen have in Kconfig.defconfig:

config BT_CTLR
        default BT

This sort of dependency is helpful for people using those boards, but not for people using out of tree boards where it isn't present.

The error above at least tells me I need to find some sort of HCI thing, and eventually I might come to figure out CONFIG_BT_CTLR=y is also required.

Leaving these dependencies implicit keeps developers ignorant of what's actually going on: I assumed all I needed to convert that sample to extended advertising were CONFIG_BT_EXT_ADV* options along with the code. When I do that, the code builds, but starting advertising fails:

Advertising failed to start (err -5)
[00:00:00.007,568] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.007,568] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.007,568] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 2.3 Build 99
[00:00:00.008,117] <inf> bt_hci_core: Identity: c0:75:41:40:ac:e3 (random)
[00:00:00.008,117] <inf> bt_hci_core: HCI: version 5.2 (0x0b) revision 0x0000, manufacturer 0x05f1
[00:00:00.008,117] <inf> bt_hci_core: LMP: version 5.2 (0x0b) subver 0xffff
[00:00:00.011,932] <wrn> bt_hci_core: opcode 0x2036 status 0x01

This is not a useful error, i.e. one a developer can use to figure out what's gone wrong. I've now been informed that I also need CONFIG_BT_CTLR_ADV_EXT=y, which advances me to #28090, which is progress. Had I known that CONFIG_BT_CTLR=y was also required I might have been able to find this out myself by looking for a relevant CONFIG_BT_CTLR* option.

New users of Zephyr (or its subsystems, or new features of its subsystems) will not magically know things that appear obvious to Zephyr maintainers. Let's please make it easier for them.

So, stepping away from the bluetooth samples it would be very nice if Zephyr had this policy:

All samples should completely describe their minimal requirements in their prj.conf. Do not assume options are default-selected in board files, because for external board definitions they probably won't be. Relying on depends on and selects in subsystem Kconfigs is fine. But samples should not be dependent on default-selects from specific boards.

Further, I propose that failure to do this should be considered a bug, so that fixing it gets enough priority to limit the amount of time new users waste rediscovering things they could have been told.

pabigot commented 4 years ago

@ioannisg @carlescufi this relates generally to process, and specifically to Bluetooth. As Zephyr gains visibility with more people who will try to do new things I feel effort directed towards making their experience more friendly is worthwhile.

cvinayak commented 4 years ago

The context here is specific to experimental feature and it is best that users explicitly select such features. I am not aware of any build warnings to let users know of experimental features if these features get added in upstream samples without satisfactory validations.

Yes, when a public sample is added that markets such new feature the project configuration shall be complete.

pabigot commented 4 years ago

The implicit selection of CONFIG_BT_CTLR=y is the context: it's in-tree for the beacon example, and unknown others. The last time I hit it was #27105. I'd really like to stop running into the same problem in different clothes.

It's that absence that misled me into misconfiguring for the related experimental feature, which is an example of how this impacts users.

carlescufi commented 4 years ago

I fundamentally agree that this is not ideal as an experience for the user, but I really don't know how to solve this easily in a general manner. In the particular case of Bluetooth it should be doable, but I think we'd have to do this case by case.

keith-zephyr commented 4 months ago

@nashif states that this issue may not be good topic for the Process WG. Samples do have requirements that are documented, but requirements haven't always been enforced. @kartben is looking at cleaning up tech debt on the samples.

Documentation is here: https://docs.zephyrproject.org/latest/samples/sample_definition_and_criteria.html

@dleach02 any gaps where samples don't work on some boards is a bug in the sample that hasn't properly specified the constraints. @kartben For users it will be ideal if users only had to enable one high level config (CONFIG_BT) and the sample should just work. @nashif - The samples need to explicitly specify which platforms the sample is known to work on. This does cause CI problems down the road when a sample is run against a board that doesn't include required features. @dleach02 the yaml file currently provides hints as to what boards should work, but the documentation should enumerate the boards explicitly. @jfischer-no samples need to work against all driver instances implemented in the tree. It can be cumbersome to try to identify all the boards that should be able to run a test. @nashif Current filters aren't sufficient. @dleach02 Instead of documenting the boards the sample has been validated, samples should clearly document the requirements of the sample. e.g. platform needs a display, flash, etc.. @jfischer-no There are gaps, for instance not all boards that support flash work with all flash samples. The sample description needs to more consistently identify requirements. @nashif - it's clear that there is a problem to be solved here. @kartben to work on next steps for cleaning up samples.