Open eugmes opened 4 years ago
Thanks for your report.
We'd like to solve these types of problems in general by divorcing zephyr base setting from west itself, now that @tejlmand implemented a Zephyr CMake package which makes the environment variable obsolete. But the transition is tricky and we haven't done it yet, and I'm sorry this is causing you trouble.
I'm a bit surprised that this is a continual problem. The code should set zephyr.base once in the config file and then not have to set it again, so as long as your .west/config has a [zephyr] base = ...
everything should work in subsequent runs. Is that not what you're seeing?
Is this in a CI environment where the workspace is being set up fresh each time? If so, perhaps you can work around this issue by having the CI setup scripts run west config zephyr.base "$ZEPHYR_BASE"
or so, since you say the environment variable is set?
@mbolivar I believe @eugmes problem could be that two west
processes writes the file simultaneously, and thus the file get corrupted.
Which also explains why running west boards
first, before running n-west build
commands in parallel work, cause then the config is set, and will not be updated by the parallel runs afterwards.
(to save some time used by cmake).
What time you try to save ?
west build
invokes CMake, just as you would do by hand, and then ninja
afterwards.
It is simply a convenience instead of doing:
cmake -DBOARD=<board_name> -Bbuild .; ninja -Cbuild
@tejlmand Yes, the problem only occurs when one runs multiple west builds in parallel, when they started almost at the same time. There is a chance of config file corruption then, but only when no other extension command was run before.
I have a test repo that builds multiple simple applications for multiple configurations. All together there are more than 100 binaries to build, and the number is going to increase. For those applications the configuration step using cmake takes almost the same time as building the applications. Almost nothing can be done to speed up building, it is already done in parallel. But the time used to run cmake can be reduced by running west processes in parallel. That's why I'm trying to use ninja for that. That not optimal, but it still results in significant speedup.
But the time used to run cmake can be reduced by running west processes in parallel.
I'm just being curios, why not simply run n
-CMake instances in parallel ?
After all, west build
invokes CMake, so you would save the overhead of west
.
@tejlmand mainly because west
seems to be an official way to build zephyr apps.
@mbolivar I believe @eugmes problem could be that two
west
processes writes the file simultaneously, and thus the file get corrupted.
I understand that, but west should only write the file if there's not a value already there, hence my follow up questions. If you could provide some answers for those, @eugmes, that would be helpful. Thanks.
@mbolivar-nordic The problem is that the process of checking and setting the config variable is not atomic. So if multiple west build
processes are started in quick succession then several of them could see that the value is not set. Additionally writing to the config file is also not atomic, so when multiple process try to write to this file, it may end up being corrupted.
Consider this sequence for example:
process 1: is the value set? - no
process 1: compute the default value
process 2: is the value set? - no
process 2: compute the default value
process 1: start writing the default value
process 2: start writing the default value
process 1: write complete
process 2: write complete
In this case the config file will contain both data written by process 1 and 2, in arbitrary order, depending on exact timing of events.
I understand the race. Are you able to apply any of the suggested workarounds?
@mbolivar-nordic yes, running west boards
before doing anything else works fine.
@tejlmand mainly because
west
seems to be an official way to build zephyr apps.
Thanks for the reason.
It is correct that Zephyr docs are describing the build process using west
and it is an easy way to get started with Zephyr.
But actually west
is an optional (but very convenient tool), so there is no problem in using cmake
directly, especially if it solves your problems.
That being said, west
should still be fixed so that this issue can be fixed.
(we would like to remove zephyr.base from Cmake, but currently we need west to be backward compatible.)
That being said,
west
should still be fixed so that this issue can be fixed. (we would like to remove zephyr.base from Cmake, but currently we need west to be backward compatible.)
I agree with this.
@mbolivar-nordic yes, running west boards before doing anything else works fine.
Ok, glad to know you have a workaround, since the fix (getting rid of zephyr.base entirely) is something we don't have scheduled.
@mbolivar-nordic with Zephyr 2.7.0 LTS out, it now the time to remove this in west
?
Users on older Zephyr (for example 1.14 LTS
) can manually set zephyr.base
if they need it.
I believe we don't need to set this value automatically anymore.
I'm attempting to run multiple
west build
commands in parallel (to save some time used by cmake). This breaks randomly with corrupted.west/config
file. I tracked this down to this line:https://github.com/zephyrproject-rtos/west/blob/618020d1979d07003d6df40ce59b413785ce3015/src/west/app/main.py#L692
Here multiple processes may attempt to write to the same file at the same time in my scenario. It would be good to get rid of this. Right now one have to run a zephyr-provided command with valid arguments before doing the build.
west boards
seems to work.Note that I have
ZEPHYR_BASE
environment variable set during the build, but that does not preventwest
from mangling the config file.