Closed nandojve closed 3 years ago
Another option would be to use the groups feature of west projects to clone only certain projects by default.
@nandojve I think using groups is better if we want to avoid issues with large optional repositories in general. Shallow fetches don't actually save that much time in cases we have measured, and I believe @marc-hb discovered that repeatedly doing a shallow fetch in the same repository gets you into weird states.
git clone --depth 1 on the valid hash is most than enough to any user use and develop around that module.
(emphasis mine)
To be clear: you're only referring to optional or very stable modules where development is not happening, correct? I can't imagine developers not using version control in a repo under development.
BTW this issue seems to miss a very important difference between developers and CI: developers clone rarely from scratch whereas CI does many times a day.
Shallow fetches don't actually save that much time in cases we have measured,
On small repos shallow makes barely any difference. There are some numbers in https://github.com/zephyrproject-rtos/west/issues/319 and in the links from there. @nandojve how do your numbers look like?
I believe @marc-hb discovered that repeatedly doing a shallow fetch in the same repository gets you into weird states.
git fetch --depth constant
asks git to forget stuff it already downloaded ("deepen or shorten"). That sort of bizarre requests can create a "history patchwork" with semi-random holes which seems to be challenging not just for the user trying to look at the history but sometimes for the tool itself as well.
clone/fetch --depth
is an overrated afterthought/hack that is basically the opposite of how git was designed originally. It can cause many surprising strange situations. This one took me days to figure out: https://github.com/thesofproject/linux/issues/2556
I think using groups is better if we want to avoid issues with large optional repositories in general.
Agreed. In fact I just "optimized" some new Zephyr build in CI massively by merely replacing west update
with west update module1 module2 ...
so for CI you can trivially implement "groups" without even using the new feature.
The new narrow
feature can also help with some repos without breaking git in many different subtle ways.
For CI west
can also clone repos from git mirrors/caches now, this requires some preparation but you simply can't get any faster than that.
For the "next optimization frontier" I would recommend fetching multiple repos in parallel, that's what git-repo
has been doing for years.
I opened this because I believe there are better paths than current one, and I suggested one option. If there are better ways, I'm ok with that. I think ordinary use cases will not touch modules, instead, they will use it, or they are dependencies.
The goal from this enhancement is improve this scenario and avoid useless download/time for ordinary use case. The user that clone Zephyr and only wants build their apps. That use cases may not require a full module history. For instance, not sure if it is relevant, but a command like west update minimal
could be enough.
This is a suggestion of enhancement that I think be important. If there are a topic around that and this suggestion can be irrelevant, anyone feel free to close it.
Hi @nandojve - yep, I totally understand why you want to look for ways to optimize. You are not alone, and west 0.11 has several features which are meant for that. The west 0.11.0 release is actually already available on PyPI; I just haven't sent the email yet because release documentation is not merged yet: https://github.com/zephyrproject-rtos/zephyr/pull/34797
I encourage you to give west update --narrow
a shot. If that isn't enough, I recommend using a cache, especially if it is a CI issue. The link to west issue 319 that Marc gave contains a lot of measurement that shows the best thing to do in CI is generally just to use a cache. If it's about developer experience, then there are other options that depend on your use case.
I think given @marc-hb 's experience report in SoF, that we can agree that the west update
default behavior should not be --depth 1
, as proposed by this issue. Do you agree, @nandojve ? If so can you please close it and let me know how the new west options work for you? If it's still not enough, we can do more benchmarking and figure out the right thing to do. Thanks!
Thank you folks for giving me a pretty good perspective about effort dedicated to improve this topic. I'm convinced that you've been looking all possibilities and this request is well addressed.
Something very, very simple that has been mentioned in the references but not here yet is to replace west init -m URL
with git clone --depth 5 URL && west init -l ...
in automation.
Such a script change takes practically no time to implement, however it makes a performance difference only if your manifest repo has a large git history (which is the case of zephyr.git).
Something very, very simple that has been mentioned in the references but not here yet is to replace
west init -m URL
withgit clone --depth 5 URL && west init -l ...
in automation.Such a script change takes practically no time to implement, however it makes a performance difference only if your manifest repo has a large git history (which is the case of zephyr.git).
@marc-hb Would it be possible to give some examples or to elaborate on how you optimized build scripts for CI using west update, west build, etc. How you would setup caches etc.
I don't have any single, authoritative recipe, I'm not an expert and things have been evolving. Today I would use --filter
(zephyrproject-rtos/west#638) instead of --depth
and I would also take a look at zephyrproject-rtos/west#625
Is your enhancement proposal related to a problem? Please describe. I can imagine that majority of developers that are using Zephyr don't require full module clone. This can save tons of space and speed up some CI/CD systems. For these cases, a fetch that uses a --depth 1 is enough.
For instance, the new tensorflow module fetch:
git clone --depth 1 on the valid hash is most than enough to any user use and develop around that module. A new command parameter could be added to allow experts have full tree access.
Describe the solution you'd like
west update
(the default behaviour) should fetch pointed hash with --depth 1 parameter.