zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.34k stars 6.33k forks source link

The contents of .config should not be able to prevent users from running menuconfig. #9573

Open cvinayak opened 6 years ago

cvinayak commented 6 years ago

When enabling nRF5 SPI drivers, invalid values are permitted, but next time opening menuconfig fails.

make menuconfig -- Selected BOARD nrf52840_pca10056 Zephyr version: 1.12.99 Parsing Kconfig tree in /Users/hemal/workspace/zephyr/Kconfig Using /Users/hemal/workspace/zephyr/build/peripheral/zephyr/.config as base warning: the value '' is invalid for SPI_0_NRF_SCK_PIN (defined at drivers/spi/Kconfig.nrfx:55), which has type int -- assignment ignored warning: the value '' is invalid for SPI_0_NRF_MOSI_PIN (defined at drivers/spi/Kconfig.nrfx:62), which has type int -- assignment ignored warning: the value '' is invalid for SPI_0_NRF_MISO_PIN (defined at drivers/spi/Kconfig.nrfx:69), which has type int -- assignment ignored Error: Aborting due to non-whitelisted Kconfig warning 'warning: the value '' is invalid for SPI_0_NRF_SCK_PIN (defined at drivers/spi/Kconfig.nrfx:55), which has type int -- assignment ignored'. Note: If this warning doesn't point to an actual problem, you can add it to the whitelist at the top of /Users/hemal/workspace/zephyr/scripts/kconfig/kconfig.py. CMake Error at /Users/hemal/workspace/zephyr/cmake/kconfig.cmake:149 (message): command failed with return code: 1 Call Stack (most recent call first): /Users/hemal/workspace/zephyr/cmake/app/boilerplate.cmake:251 (include) CMakeLists.txt:2 (include)

-- Configuring incomplete, errors occurred! See also "/Users/hemal/workspace/zephyr/build/peripheral/CMakeFiles/CMakeOutput.log". See also "/Users/hemal/workspace/zephyr/build/peripheral/CMakeFiles/CMakeError.log". make: *** [cmake_check_build_system] Error 1

SebastianBoe commented 6 years ago

"When enabling nRF5 SPI drivers, invalid values are permitted"

I'm not sure if permitted is the right word here, Kconfig is rightly disallowing the invalid value '' to be assigned to "SPI_0_NRF_SCK_PIN".

Perhaps "invalid values are defaulted to" would be more correct to say?

I see three options here; default the pins to a nonsensical, but Kconfig-valid value, e.g. all pins are set to pin 0 when the SPI instance is enabled.

Or do nothing, and continue to let Kconfig error out on the user with this error message:

warning: the value '' is invalid for SPI_0_NRF_SCK_PIN (defined at drivers/spi/Kconfig.nrfx:55), which has type int -- assignment ignored warning: the value '' is invalid for SPI_0_NRF_MOSI_PIN (defined at drivers/spi/Kconfig.nrfx:62), which has type int -- assignment ignored warning: the value '' is invalid for SPI_0_NRF_MISO_PIN (defined at drivers/spi/Kconfig.nrfx:69), which has type int -- assignment ignored Error: Aborting due to non-whitelisted Kconfig warning 'warning: the value '' is invalid for SPI_0_NRF_SCK_PIN (defined at drivers/spi/Kconfig.nrfx:55), which has type int -- assignment ignored'.

which pretty clearly IMHO explains that there is more configuration that needs to be done by the user.

Or finally, the best solution is to either resolve or apply one of the workarounds of #7302. E.g. have the boards duplicate their DT information about default pin values in Kconfig.

cvinayak commented 6 years ago

@SebastianBoe @ulfalizer the problem is, I can not get back into menuconfig to correct the values.

ulfalizer commented 6 years ago

Yeah... int and hex symbols are a bit confusing. They default to the empty string, but the empty string can't be assigned to them. Kconfiglib imitates that behavior for compatibility with the C tools.

A quickfix is to manually edit the values in <project-build-dir>/zephyr/.config. That isn't super user friendly though.

The simplest quickfix at the Kconfig level is to just give all int and hex symbols defaults.

This is really "just" a warning, but scripts/kconfig/kconfig.py promotes warnings to errors, and it always runs before menuconfig gets run. (@SebastianBoe Is this necessary btw, even when there are no changes to board defconfig files and prj.conf?)

Could have made int/hex symbols default to e.g. 0 by default instead, but there's no way to know whether that value makes any sense for a particular symbol. Another option would be to allow empty strings to be assigned to int/hex symbols, but that seems a bit weird.

If needed, I could tweak the warning whitelist to not turn that warning into an error. That warning might be nice to have as an error in other cases though.

ulfalizer commented 6 years ago

menuconfig itself would still generate those warnings btw, even if scripts/kconfig/kconfig.py wasn't run. They wouldn't be turned into errors though.

SebastianBoe commented 6 years ago

They default to the empty string, but the empty string can't be assigned to them.

I would say that this is the root cause, and what must be fixed. Fair enough that 0 might not make sense, but at least it will be valid. Which is what is being reported here.

Producing defaults that make sense is another issue.

The simplest quickfix at the Kconfig level is to just give all int and hex symbols defaults.

+1

ulfalizer commented 5 years ago

I would say that this is the root cause, and what must be fixed. Fair enough that 0 might not make sense, but at least it will be valid. Which is what is being reported here.

One (dubious) upside is that you get notified if you forget to set the value of an int/hex symbol (either through a default or in a .config). Not sure how helpful that is in practice. Maybe people already assume the implicit default is 0.

And yeah, it's a bit wonky that you can write a .config that will produce warnings when read back in.

I'm always a bit reluctant to break compatibility with the C tools though, because it's often handy to have the output match 100% (e.g., during testing). It could still be a custom patch though, if we can't live with the behavior.

I could try to get it changed in the C tools as well, but as it "just" turns into an easily-fixed warning there (nothing promotes it to an error), that might provide useful information in rare cases, people might be reluctant.

SebastianBoe commented 5 years ago

I want to turn this into a feature request.

The reporter wants this issue to be about the fact that if you have a corrupted .config, for whatever reason, then it is not possible to do "ninja menuconfig". This is not possible because CMake uses the corrupted config and behaves in a corrupted/undefined way which could include doing anything, including crashing before Generation time, and thereby not creating a build system that can invoke menuconfig to rectify the corrupted config.

This seems like a critical issue, but in practice, this has proven to not be a big problem.

This is because of several things:

@cvinayak : Is it OK by you to change the description to:

"The contents of .config should not be able to prevent users from running menuconfig."

ulfalizer commented 5 years ago

@SebastianBoe Does CMake parse the .config even before running kconfig.py?

I thought the problem was that kconfig.py error'd out because of the bad .config file and warnings-turned-errors, but if CMake parses .config first, then that's a problem too.

If it's just kconfig.py that's the problem, then a flag could be added to suppress turning warnings into errors, and then that flag could be set from CMake when running the menuconfig is the final target.

SebastianBoe commented 5 years ago

Does CMake parse the .config even before running kconfig.py?

No. But it does parse the .config before running menuconfig.py. So if menuconfig.py creates a corrupted .config file, this .config file will prevent menuconfig from being run again.

turning off -Werror is not sufficient to resolve this issue.

ulfalizer commented 5 years ago

@SebastianBoe Should be able to work around it by passing a flag to kconfig.py that tells it whether to error out for warnings, and disable that flag if the final target is to run the menuconfig.

Don't know how hard that is to do in CMake though.

cvinayak commented 4 years ago

@SebastianBoe and @ulfalizer This issue is old now and "I" dont seem to have lately faced any issues related to this issue. Can we agree to close this issue as "won't fix"?

SebastianBoe commented 4 years ago

This is still valid.

24601 commented 3 years ago

I am running into this in 2.4.99 still.

@SebastianBoe and @ulfalizer This issue is old now and "I" dont seem to have lately faced any issues related to this issue. Can we agree to close this issue as "won't fix"?

tejlmand commented 3 years ago

I am running into this in 2.4.99 still.

@24601 Which is very unfortunately. This is definitely something I have plans to solve, however it is a tricky thing to solve.

One thing is clear, it should definitely be possible to run ninja menuconfig. But what to do with the invalid settings ? We have nowhere to place them if they no longer have a corresponding config entry. In case the config entry has been renamed, then it would be nice to be able to point users to the new config, but when the error happens it's hard to now the correct old name.

If a value has changed the definition of what a valid value means (and current value is outside that range), then it might be possible to direct the user to the setting, but we need to prefill Kconfig with a valid value (which might be different from what the user wants). And from there the user can adjust.

One idea could be to introduce a Kconfig.deprecated.cmake, similar to: https://github.com/zephyrproject-rtos/zephyr/commit/1bc640b97255819a0c4f022a5bc08ef2670e7e1d that could be used in such occasion when configs are removed, renamed or changed valid ranges. Of course it requires developers to updated it, but could be a start.

I have some more ideas, just lack the time to work on them.

Milestoned 2.5.0 to put a light pressure on myself.

24601 commented 3 years ago

@tejlmand - Thank you for the full reply and consideration. I understand the difficult situation you're in, and think you have a good approach to the first set of things to address this.

I think after reading your discussion of it, the only thing I can add (you are way better versed in this than I am no doubt), is considering how menuconfig was even allowed to write an invalid config to begin with.

I see the entry to this bug/problem as possible to view in two different lights:

  1. A file created by many people/processes/external touches that is invalid that menuconfig has to deal with.
  2. A file created exclusively by menuconfig that is invalid.

In the case of 1, if the user or an external tool left that file in an invalid state through an external edit, I still think "failing gracefully" is a good way to try to get to, even if it involves a "we noticed you had an invalid config, we've traversed that tree and disabled the given options and are starting menuconfig in safe mode to let you resolve them", but even this is a luxury/kindness to the user, because they brought the bad data/input.

But, I think what we're seeing (or at least I am) is a situation of 2, where we are seeing a file that was actually written by menuconfig that, without doing anything but saving and opening menuconfig again, we're seeing this failure. Is it possible or advisable, even as a stop-gap until we have a broader fix strategy here, to "dry run" before a save where this validation error would be caught and the invalid file would never be written period? That could be a good way to stop at least some of the ways of invalid .config even happening to begin with, and the "nice to have" "menuconfig is more robust at eating/handling bad data" can come later, but I think we could start with "let's not write out knowingly bad data from menuconfig that menuconfig itself cannot open" and it maybe it is as easy enough as before save writing the config out to a temp file and attempting to parse it, and if it fails, don't even allow saving or make the user force such a save?

I am running into this in 2.4.99 still.

@24601 Which is very unfortunately. This is definitely something I have plans to solve, however it is not a tricky thing to solve.

One thing is clear, it should definitely be possible to run ninja menuconfig. But what to do with the invalid settings ? We have nowhere to place them if they no longer have a corresponding config entry. In case the config entry has been renamed, then it would be nice to be able to point users to the new config, but when the error happens it's hard to now the correct old name.

If a value has changed the definition of what a valid value means (and current value is outside that range), then it might be possible to direct the user to the setting, but we need to prefill Kconfig with a valid value (which might be different from what the user wants). And from there the user can adjust.

One idea could be to introduce a Kconfig.deprecated.cmake, similar to: 1bc640b that could be used in such occasion when configs are removed, renamed or changed valid ranges. Of course it requires developers to updated it, but could be a start.

I have some more ideas, just lack the time to work on them.

Milestoned 2.5.0 to put a light pressure on myself.

tejlmand commented 3 years ago

@24601 Thank you for the follow up. The more input, the better we can address this issue.

I see the entry to this bug/problem as possible to view in two different lights:

A file created by many people/processes/external touches that is invalid that menuconfig has to deal with.

Yes, this is a user error, but we should still try to help and assist people

A file created exclusively by menuconfig that is invalid.

If this happens, that's a plain bug, and feel free to report that as such. We should definitely ensure that menuconfig doesn't write invalid config files, or be able to gracefully handle it, if/when it happens. The dependencies can be a bit tricky to get right, and there are a lot of contributors to Zephyr, so it does happen that certain dependencies are capable of canceling out each other :disappointed: , but please file issues if there are specific config options that may cause this to happen.

I have a third experience, that you haven't described. And that's where users are actually are having a valid configuration, then doing a git fetch zephyr, followed by git checkout zephyr/master, and finally ninja --> Error with invalid .config.

That is also a case that must be considered, cause that is a bad user experience imho.

24601 commented 3 years ago

@tejlmand - my pleasure, as we become more engaged with Zephyr we look forward to contributing more and becoming more productive members.

I believe we have a situation with the MODEM_UBLOX_SARA that creates the situation I mentioned above: enabling MODEM_UBLOX_SARA on an otherwise fine config results in a .config that is broken such that menuconfig cannot open it even though it wrote the file. I will work on a demonstration repo and file an appropriate issue with a properly documented issue and reference herein.

tejlmand commented 1 year ago

and with sysbuild introduced, this even prevents running sysbuild_menuconfig if CMake re-runs on the sysbuild level and an image, for example mcuboot fails because .config content.

zephyrbot commented 6 months ago

Hi @tejlmand,

This issue, marked as an Enhancement, was opened a while ago and did not get any traction. Please confirm the issue is correctly assigned and re-assign it otherwise.

Please take a moment to review if the issue is still relevant to the project. If it is, please provide feedback and direction on how to move forward. If it is not, has already been addressed, is a duplicate, or is no longer relevant, please close it with a short comment explaining the reason.

@cvinayak you are also encouraged to help moving this issue forward by providing additional information and confirming this request/issue is still relevant to you.

Thanks!