Open rjuju opened 3 months ago
Unfortunately, I'm not able to reproduce the problem.
What I did:
podman-compose -f compose/powa_remote_mode.yml up
(not detached)ran the same command again.
Everything displays as expected.
How did you get the "tpc" and "obvious" databases? I don't have them when using the powa_remote_mode.yml
compose file. I thought that it was only avaiable with the demo workload image.
I'm not sure what the ctrl-c exactly does. Can you try
podman-compose -f compose/powa_remote_mode.yml up -d
# wait until it's up and got a couple of snapshot
podman-compose -f compose/powa_remote_mode.yml down -t0
podman-compose -f compose/powa_remote_mode.yml up -d
which is what I'm usually using, if that matters.
How did you get the "tpc" and "obvious" databases? I don't have them when using the powa_remote_mode.yml compose file. I thought that it was only avaiable with the demo workload image.
I'm just using some modified compose file. It's simply the powa_remote_mode.yml on which I added the 3 containers used for the dev demo compose file. Apart from the extra containers there are no modifications
Could you share the modified compose file just in case it helps reproducing?
$ diff ../powa_demo.yml compose/powa_remote_mode.yml [0] 27/07/2024 23:21:45 [AC/DC]
70,112d69
<
< pgbench-std-primary:
< image: powateam/powa-pgbench
< container_name: powa-dev-pgbench-std-primary
< restart: on-failure
< environment:
< PGHOST: 'remote-primary'
< PGUSER: 'postgres'
< PGPORT: 5433
< BENCH_SCALE_FACTOR: 10
< BENCH_TIME: 60
< BENCH_FLAG: '-c1 -j1 -n -R 10'
< depends_on:
< remote-primary:
< condition: service_healthy
<
< pgbench-std-standby:
< image: powateam/powa-pgbench
< container_name: powa-dev-pgbench-std-standby
< restart: on-failure
< environment:
< PGHOST: 'remote-standby'
< PGUSER: 'postgres'
< PGPORT: 5434
< BENCH_SKIP_INIT: 'true'
< BENCH_SCALE_FACTOR: 10
< BENCH_TIME: 120
< BENCH_FLAG: '-c2 -j2 -S -n -R 10'
< depends_on:
< remote-standby:
< condition: service_healthy
<
< pgdemoworload-std-primary:
< image: powateam/powa-demoworkload
< container_name: powa-dev-demoworkload-std-primary
< restart: on-failure
< environment:
< PGHOST: 'remote-primary'
< PGUSER: 'postgres'
< PGPORT: 5433
< depends_on:
< remote-primary:
< condition: service_healthy
OK, I could reproduce without the need of the extra containers. I have an idea (possibly wrong) about what's happening.
In my case, with the default compose file, the collector for the primary server is stopped
after the first compose up
command (starting from a clean podman environment).
At this point, the overview page for the primary server (http://localhost:8888/server/1/overview/) doesn't show anything (empty components) and the grid for the databases is kind of broken: there are more colum groups than columns.
In the web console, the json response for this page looks like:
{
"server": "1",
"title": "Details for all databases",
"metrics": [
"by_database.calls",
"by_database.runtime",
"by_database.avg_runtime",
"by_database.shared_blks_read",
"by_database.shared_blks_hit",
"by_database.shared_blks_dirtied",
"by_database.shared_blks_written",
"by_database.temp_blks_read",
"by_database.temp_blks_written",
"by_database.io_time"
],
"columns": [
{
"name": "datname",
"label": "Database",
"url_attr": "url"
}
],
"toprow": [
{},
{},
{
"name": "Execution",
"colspan": 3
},
{
"name": "Blocks",
"colspan": 4
},
{
"name": "Temp blocks",
"colspan": 2
},
{
"name": "I/O"
},
{
"name": "WAL",
"colspan": 3
},
{
"name": "JIT",
"colspan": 2
}
],
"type": "grid"
}
The problem seems to be that the toprow
column groups don't match the columns (metrics). We actually don't take the metrics removal into account when we compute the toprow.
On the UI part, there might be a problem too. If the collector is manually reloaded (using the Actions
menu on the top bar), the grid layout stays broken until the user reloads the whole page (F5 or reload button in the browser) or navigates in the pages (for example go back to list of servers and then choose primary server again).
Can you confirm that refreshing the page or navigating in PoWA fixes the glitch?
For the record, when the compose file is just run, before the collector is reloaded, no version is available for the extensions even though they are marked as avaiable, installed and sampled.
The versions are shown right after the collector reload.
In database.py
we have code that distinguishes different versions of pgss for the toprow. In server.py
, we don't.
Can you confirm that refreshing the page or navigating in PoWA fixes the glitch?
It doesn't. What works is indeed to manually reload the coordinator and then refreshing the page.
I guess that when podman fetches the powa-web image, it's done before starting the collector which means that by the time the collector stops the remote servers are already up and running. I'm also unsure why the depends_on
conditions are not applied.
Anyway, good catch! I thought that everything should have been working since the snapshots are happening. I guess that there is also a bug in the collector, I could teach the collector to update the version are the first successful snapshot or something like that, on top of fixing server.py.
I fixed the toprow headers in https://github.com/powa-team/powa-web/commit/c8d480cdedf851e1e8eba72d8d41de27b019ee1c. The code for that in database.py was also broken since the planning time metric has been added, but went unnoticed as it didn't mess up the grid too much.
Is there anything left to be done?
I think it's all good now. I will double check just in case. thanks for the timely investigation!
@pgiraud I don't know if you can reproduce this problem.
On my side, trying the powa_remote_mode.yml compose file (I just pushed the missing POWA_REMOTE_PORT on this one and another compose file + the use of v3 format) I have a normal looking instance page, e.g.:
But if I shutdown the compose file and restart it, for some reason the db detail top row get messed up and the last few columns are pushed out as if there was a wrong colspan:
Just restarting the powa-git-web container is not enough, I need a full "podman-compose up && podman-compose up" for that. And once done, it stays broken no matter what I do, the only way to fix it is to remove the image and fetch it again.
Can you reproduce the problem? If yes, can you fix it?