nextcloud / server

☁️ Nextcloud server, a safe home for all your data
https://nextcloud.com
GNU Affero General Public License v3.0
26.96k stars 4.02k forks source link

(Enhance) dependency update process #27442

Open MichaIng opened 3 years ago

MichaIng commented 3 years ago

Hey guys, not sure if a meta discussion is right here, but I couldn't find a better matching sub forum either. Sorry for the long text, I hope to understand things a little better and probably I'm then able to effectively help bots finishing their job during calm night hours or so 😄.

To fix CI/CD tests for a PR of mine, I reviewed them a bit, compared results with other PRs to verify whether a failure is caused by my changes or not, and indeed some checks were solved by a rebase onto current master.

I then noticed that a lot of dependabot PRs hang as well on such "formal" check failures, while the bot reports 100% compatibility, nextcloud-bot requested a merge (what is actually the trigger for that?) and approved it together with github-actions bot, so that it could have been merged automatically without human interaction, if the CI/CD tests were gone through. I played a bit with the bots PRs and indeed in some cases a simply rebase request solved it and hence triggered an automated merge, so far so good.

Now there are some other possible issues which may delay or break those automated dependency bumps, also for security-related ones, being open for months. Aside of the fact that the related security patch is then missing, it also accumulated a long list of open PRs, making it difficult to keep human-made PRs in view, which are moved back page by page.

One commonly failing check is the one for JavaScript build changes. From what I understand, this is as the repository does not only contain source code, but compiled JavaScript as well. So when a Node module is bumped, which has an effect on any of these scripts, a re-build within the test leads to changed files and then we get something like this:

Run bash -c "[[ ! \"`git status --porcelain `\" ]] || ( echo 'Uncommited changes in webpack build' && git status && exit 1 )"
Uncommited changes in webpack build
HEAD detached at pull/26730/merge
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
    modified:   apps/files/js/dist/personal-settings.js
    modified:   apps/files/js/dist/personal-settings.js.map
    modified:   apps/user_status/js/user-status-menu.js
    modified:   apps/user_status/js/user-status-menu.js.map
    modified:   apps/weather_status/js/weather-status.js
    modified:   apps/weather_status/js/weather-status.js.map
    modified:   apps/workflowengine/js/workflowengine.js
    modified:   apps/workflowengine/js/workflowengine.js.map
    modified:   core/js/dist/login.js
    modified:   core/js/dist/login.js.map
    modified:   core/js/dist/main.js
    modified:   core/js/dist/main.js.map
    modified:   core/js/dist/unified-search.js
    modified:   core/js/dist/unified-search.js.map

To solve this, as far as I understand, /compile amend / can be done to trigger a rebuild and amend by npmbuildbot-nextcloud. The issue then is that dependabot denies to further handle the PR, as it has been modified by someone/something else. So a manual merge is required then.

~And then there is dependabot-preview, which is still opening PRs (at least it did 1.5 days ago), but is now not able to further deal with them anymore, so all it's PRs need to be in case rebased or merged manually. Open PRs by it block dependabot (non-preview) from opening own ones, as the branch it would create does already exist. As the PR is only handled by the bot (version) that opened it, I guess dependabot-preview should be disabled, and all it's PRs either closed (to have them re-opened by dependabot) or manually reviewed and merged. This is where most security dependency bumps are hold a long time, like #26730.~ EDIT: Disabled in the meantime.

Then another minor issue with the drone Selenium acceptance tests:

Starting ChromeDriver 90.0.4430.24 (4c6d850f087da467d926e8eddb76550aed655991-refs/branch-heads/4430@{#429}) on port 21084
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
[1623151991.858][SEVERE]: bind() failed: Cannot assign requested address (99)

All logs are spammed with these errors, while it seems to not affect the test results. I just recognised as I had a look into the long taking ones, and thought that this might be the reason. EDIT: In the meantime I think I understand the tests better and Selenium is just running as a provider for the actual tests and prints the above warning on every actual test scenario, right?

And generally: While the GitHub tests are for free (?), the drone has some limit, doesn't it? I mean not only for concurrent tests but also overall tests in a certain time range, so that commits/pushes shouldn't be spammed (anyway)? I see the drone sometimes Waiting for status to be reported, but sometimes it starts regardless 🤔.

szaimen commented 3 years ago

cc @nextcloud/server for some more feedback/discussion on this

MichaIng commented 3 years ago

Some things have been sorted in the meantime, either bot-wise or because I have a better understanding how things work. It's now basically two things that may be great to automate dependency bumps as much as possible:

skjnldsv commented 3 years ago

Hi @MichaIng ! Thanks for diving into this and taking the time to expand your thoughts 🙂 There is ongoing internal discussions about removing those shipped bundles. There are two reasoning behind this:

  1. The repo is working as-is just by cloning (easy for newcomers)
  2. The release script does not have a building step. It would need to be added and can have various potential issues (different configs, one app failing, etc etc)

We will discuss this more in-dept in the upcoming weeks :wink:

MichaIng commented 3 years ago

The repo is working as-is just by cloning (easy for newcomers)

Indeed, I do also benefit from this, when there is no need to install a development stack for testing a branch 🙂. Another argument is that many CI/CD tests need the building step as well when the bundles are gone. Most drone tests currently do the clone only, so a drone run would be significantly heavier, I guess. On the other hand, without bundles, no additional compile bot call is needed and hence 1-2 drone calls less on every PR opened/rebased where re-compiling is need.

In this case it makes sense to wait for the outcome of your internal discussion before putting any efforts into the bots, which (the efforts, not the bots) may become obsolete.

MichaIng commented 3 years ago

Currently, the compile actions take > 10 minutes. I think this would lengthen the drone checks by far too much to be applicable. If there was a single clone + compile step per drone run only, then probably, but currently this would mean ~70 times 10 minutes additional processing time, assuming that the drone takes similarly long then the GitHub action runners.

skjnldsv commented 3 years ago

We also need to improve the compile build. Stop having multiple webpack configs, but one big config file at the root, so we can also split vendors and merge dependencies. That will also reduce the size and build time

max-nextcloud commented 2 years ago

Some ideas about how to make live easier with compiled js are also discussed here: https://github.com/nextcloud/.github/issues/28

skjnldsv commented 2 years ago

Stop having multiple webpack configs, but one big config file at the root, so we can also split vendors and merge dependencies. That will also reduce the size and build time

This is done btw, building server is muuuch faster!

MichaIng commented 2 years ago

Indeed. The only left thing I have in mind, is when dependabot PRs/commits would trigger the compile command automatically, so that there is a chance to have it merge automatically even when the dependency chance causes actual JavaScript changes, given no CI errors, of course. But it implies some issues:

max-nextcloud commented 2 years ago

Would it be feasible to use app build artifacts as the source for the server build? We do have some automation in place for releasing apps but we don't use it for apps shipped with the server. Instead of git cloning shipped apps - could they be retrieved from a package and then unzipped?

That way compiling the assets could still be the responsibility of the app itself - but it would not have to keep them in the git repo at all times. In addition further steps such as removing unnecessary files or signing could be part of the packaging of the app before it's included in the server.