python / buildmaster-config

Configuration for buildbot.python.org
https://buildbot.python.org/all/#/
30 stars 56 forks source link

Issues after upgrading to buildbot 4.0 #498

Open vstinner opened 2 months ago

vstinner commented 2 months ago

Hi,

While trying to update requirements to fix a security issue, I upgrade buildbot from 3.x to 4.0: https://github.com/python/buildmaster-config/pull/497

The ./venv/bin/buildbot upgrade-master /data/buildbot/master command was run by Salt and the buildbot server is running fine.

... but, the "Release status" page is gone! I think that we have to follow https://docs.buildbot.net/4.0.0/manual/upgrading/4.0-upgrade.html#custom-plugins guide to upgrade our plugins.

vstinner commented 2 months ago

I tried to downgrade to buildbot 3.x. The database "migration" went badly :-(

Warning: Stopping this process might cause data loss
Got fatal Exception on DB
Traceback (most recent call last):
  File "/srv/buildbot/venv/lib/python3.9/site-packages/twisted/python/threadpool.py", line 285, in <lambda>
    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
  File "/srv/buildbot/venv/lib/python3.9/site-packages/twisted/python/context.py", line 117, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/srv/buildbot/venv/lib/python3.9/site-packages/twisted/python/context.py", line 82, in callWithContext
    return func(*args, **kw)
  File "/srv/buildbot/venv/lib/python3.9/site-packages/buildbot/db/pool.py", line 235, in __thd
    log.err(e, 'Got fatal Exception on DB')
--- <exception caught here> ---
  File "/srv/buildbot/venv/lib/python3.9/site-packages/buildbot/db/pool.py", line 208, in __thd
    rv = callable(arg, *args, **kwargs)
  File "/srv/buildbot/venv/lib/python3.9/site-packages/buildbot/db/model.py", line 1136, in thd
    context.run_migrations()
  File "/srv/buildbot/venv/lib/python3.9/site-packages/alembic/runtime/migration.py", line 615, in run_migrations
    for step in self._migrations_fn(heads, self):
  File "/srv/buildbot/venv/lib/python3.9/site-packages/buildbot/db/model.py", line 1122, in upgrade
    return alembic_scripts._upgrade_revs(current_script_rev_head, rev)
  File "/srv/buildbot/venv/lib/python3.9/site-packages/alembic/script/base.py", line 455, in _upgrade_revs
    return [
  File "/usr/lib/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/srv/buildbot/venv/lib/python3.9/site-packages/alembic/script/base.py", line 283, in _catch_revision_errors
    raise util.CommandError(resolution) from re
alembic.util.exc.CommandError: Can't locate revision identified by '066'
vstinner commented 2 months ago

I replaced buildbot-wsgi-dashboards dependency with buildbot-react-wsgi-dashboards: it was worse, buildbot failed to even start:

    (...)
    File "/srv/buildbot/venv/lib/python3.9/site-packages/buildbot/www/service.py", line 225, in reconfigServiceWithBuildbotConfig
      self.setupSite(new_config)
    File "/srv/buildbot/venv/lib/python3.9/site-packages/buildbot/www/service.py", line 325, in setupSite
      self.configPlugins(root, new_config)
    File "/srv/buildbot/venv/lib/python3.9/site-packages/buildbot/www/service.py", line 302, in configPlugins
      raise RuntimeError(f"could not find plugin {key}; is it installed?")
  builtins.RuntimeError: could not find plugin wsgi_dashboards; is it installed?
vstinner commented 2 months ago

Currently, the dependencies are:

buildbot@buildbot:/srv/buildbot$ venv/bin/python -m pip list|grep buildbot
buildbot                 4.0.0
buildbot-console-view    4.0.0
buildbot-grid-view       4.0.0
buildbot-waterfall-view  4.0.0
buildbot-worker          4.0.0
buildbot-wsgi-dashboards 4.0.0
buildbot-www             4.0.0

I also asked for help on #buildbot IRC channel.

vstinner commented 2 months ago

@pablogsal @ambv @encukou: Do you have any idea on how to debug such buildbot issue? Sorry for breaking the buildbot "Release Status" page by mistake :-(

vstinner commented 2 months ago

I reported the issue to buildbot bug tracker: https://github.com/buildbot/buildbot/issues/7775

vstinner commented 2 months ago

The "Log in with GitHub" URL is broken: /all is missing at the start of the path.

vstinner commented 2 months ago

https://buildbot.python.org/all/#/builders page loads https://buildbot.python.org/all/api/v2/builders which fails with HTTP 404 error, whereas it should load https://buildbot.python.org/all/api/v2/builders (all/ is missing in the API URL).

vstinner commented 2 months ago

cc @zware

zware commented 2 months ago

https://buildbot.python.org/all/#/builders page loads https://buildbot.python.org/all/api/v2/builders which fails with HTTP 404 error, whereas it should load https://buildbot.python.org/all/api/v2/builders (all/ is missing in the API URL).

I got this partially working with a hack to the Nginx config; I'll try to get it fully working and submit a PR to the Salt config.

vstinner commented 2 months ago

Maybe we should simply drop all/ from the URL and use a "regular" https://buildbot.python.org/ home URL?

vstinner commented 2 months ago

wsgi_dashboards in https://buildbot.python.org/all/?#/about says:

[{"name":"release_status","caption":"Release Status","app":"'flask.app.Flask' not yet IConfigured","order":2,"icon":"rocket"}]

Oh, there is a Flask error.

vstinner commented 2 months ago

https://buildbot.python.org/all/#/builders page loads https://buildbot.python.org/all/api/v2/builders which fails with HTTP 404 error, whereas it should load https://buildbot.python.org/all/api/v2/builders (all/ is missing in the API URL).

I reported a separated issue: https://github.com/buildbot/buildbot/issues/7776

zware commented 2 months ago

With python/psf-salt#372 merged and deployed, the web UI is at least mostly back (without the /all prefix). Release status still needs a fix, but that's beyond me for right now.

vstinner commented 2 months ago

It seems like buildbot 4.0 is most fully functional. The Release Status page is available, but it's not fully rendered.

"Releasable" versions only contain the following HTML in their "panel-body":

<div style="text-align: center;">
    <i class="fa fa-check" style="font-size:48px;color:green;"></i>
</div>

"Not Releaseable" versions only contain the following HTML in their "panel-body":

<div> <buildsummary buildid="1539736" condensed="1"> </buildsummary></div>

AngularJS is supposed to handle <buildsummary />, but I didn't understand if buildbot 4.0 still uses AngularJS or switched to React.

encukou commented 2 months ago

I'm back from EuroPython now.

For me it's not only the custom page that's misbehaving:

All show 504 (Gateway Timeout) HTTP responses to various API requests. I guess the pages will load if the server was lass busy.


According the the 4.0 upgrade instructions, all plugins do need to be rewritten to use React. (They recommend doing it before the big update, but I guess that's not an option any more.)

Does anyone here already know React, or should I learn it?

pablogsal commented 2 months ago

All the pages you mention are supposed to be in the included buildbot package so there is nothing we need to rewrite in our side. Can you confirm this with the buildbot team perhaps?

vstinner commented 2 months ago

Be careful of old URLs including /all/.

encukou commented 2 months ago

Right, the pages I mentioned are part of Buildbot; I opened https://github.com/buildbot/buildbot/issues/7820.

We do need a React rewrite for the release dashboard, which is a custom plugin. According to dashboard docs, there still is an AngularJS application, but that's probably stale docs.

encukou commented 2 months ago

I also get 404 on JS and CSS files: https://github.com/buildbot/buildbot/issues/7439

vstinner commented 1 month ago

I created https://github.com/buildbot/buildbot/issues/7826 "buildbot 4.0: Builders page doesn't limit to 25 builders per page".

vstinner commented 1 month ago

In Builds/Builders, cells in the "Builds" column are stuck at "Loading..."

The problem may be related to https://github.com/buildbot/buildbot/issues/7826 : lack of pagination. Loading all data for 100 or 200 builders take too long, and a timeout somewhere kill the request, so the UI is stuck at "Loading...".

If you limit the Builders view by clicking on a tag, the Builds column can be properly loaded.

vstinner commented 1 month ago

On Firefox, the Waterfall View is truncated, the view seems to be limited to like 100 pixels!?

Screenshot 2024-07-17 at 23-11-05 Buildbot

Same on Chromium.

Something limits the height to ... 5 pixels!?

Capture d’écran du 2024-07-17 23-17-05

vstinner commented 1 month ago

I changed the Release Status page to an ugly but working page: https://github.com/python/buildmaster-config/commit/e5d6b0672b9c189a64c2c09eb91d77e20c4c7cdd

encukou commented 1 month ago

Thank you!

encukou commented 1 month ago

I found another issue: https://github.com/python/buildmaster-config/issues/509

pablogsal commented 2 weeks ago

I think it's time to rollback the upgrade, the buildbot page it's certainly less usable and the release dashboard it's broken and everything feels either slower or slightly broken.

edelsohn commented 2 weeks ago

Each release of Buildbot has been more bloated and slower. That apparently is progress. It's more features! It's not going to get better. Buildbot developers behave as if they are funded by cpu and memory manufactures to encourage system upgrades.

pablogsal commented 2 weeks ago

@encukou could you please revert https://github.com/python/buildmaster-config/commit/e5d6b0672b9c189a64c2c09eb91d77e20c4c7cdd and the buildbot 4.0 upgrade PRs? If you don't have free cycles I can give it a go if you preefer

pablogsal commented 2 weeks ago

Also, I think we should bring this up with the buildbot project. We are probably among their heavier users and I am not sure if they are informed of all the problems we are having

zware commented 2 weeks ago

I think it's time to rollback the upgrade, the buildbot page it's certainly less usable and the release dashboard it's broken and everything feels either slower or slightly broken.

What's broken other than our custom release dashboard? I'm not seeing the same "everything is slower"; on the contrary, everything actually seems rather snappier since a selective pruning of the database.

We did rush into 4.0, but I'm not convinced that a downgrade is worth the hassle.

vstinner commented 2 weeks ago

@encukou could you please revert https://github.com/python/buildmaster-config/commit/e5d6b0672b9c189a64c2c09eb91d77e20c4c7cdd and the buildbot 4.0 upgrade PRs? If you don't have free cycles I can give it a go if you preefer

After I upgraded buildbot to 4.0, I tried to downgrade it again. Problem: database migration failed to come back to 3.x schema, it failed badly with an internal error.

When I upgraded buildbot to 4.0, the database schema was automatically updated as part of our Salt script. I didn't notice that the upgrade was from buildbot 3.x to 4.0.

pablogsal commented 2 weeks ago

What's broken other than our custom release dashboard? I'm not seeing the same "everything is slower"; on the contrary, everything actually seems rather snappier since a selective pruning of the database.

For me loading any page requires more than half a minute, including the builder list, the workers and the waterfall view. Specially slower is the page listing the builds for a given builder.

Additionally the web API takes much more to respond.

The only view that remains unaffected is the single build view where the logs can be inspected.

vstinner commented 2 weeks ago

For me loading any page requires more than half a minute, including the builder list

That's the bug https://github.com/buildbot/buildbot/issues/7826 : if you filter by "3.x" tag, it's quite fast to load the page.

encukou commented 2 weeks ago

@encukou could you please revert https://github.com/python/buildmaster-config/commit/e5d6b0672b9c189a64c2c09eb91d77e20c4c7cdd and the buildbot 4.0 upgrade PRs?

I can certainly try, if that's the consensus, but I don't think it's a good idea, now that the upgrade is done. While we didn't follow the 3→4 upgrade guide, a 4→3 downgrade doesn't even have a guide AFAIK.

I don't have measurements, but Buildbot felt this slow even before the update.

vstinner commented 2 weeks ago

Specially slower is the page listing the builds for a given builder.

It seems like listing builds of a builder is also affected by https://github.com/buildbot/buildbot/issues/7826 : there is no pagination, it just loads ALL builds. That's where @zware work to remove old build would be interesting.

pablogsal commented 2 weeks ago

I don't have measurements, but Buildbot felt this slow even before the update.

I didn't gather times from before the update but I can tell you that at least on my side this is a noticeable regression after the upgrade.