powa-team / powa-podman

Podman images for the PoWA project
15 stars 7 forks source link

Glitch in database detail grid with the powa-web-git image #14

Open rjuju opened 3 months ago

rjuju commented 3 months ago

@pgiraud I don't know if you can reproduce this problem.

On my side, trying the powa_remote_mode.yml compose file (I just pushed the missing POWA_REMOTE_PORT on this one and another compose file + the use of v3 format) I have a normal looking instance page, e.g.:

db_details_ok

But if I shutdown the compose file and restart it, for some reason the db detail top row get messed up and the last few columns are pushed out as if there was a wrong colspan:

db_details_ko

Just restarting the powa-git-web container is not enough, I need a full "podman-compose up && podman-compose up" for that. And once done, it stays broken no matter what I do, the only way to fix it is to remove the image and fetch it again.

Can you reproduce the problem? If yes, can you fix it?

pgiraud commented 3 months ago

Unfortunately, I'm not able to reproduce the problem.

What I did:

How did you get the "tpc" and "obvious" databases? I don't have them when using the powa_remote_mode.yml compose file. I thought that it was only avaiable with the demo workload image.

rjuju commented 3 months ago

I'm not sure what the ctrl-c exactly does. Can you try

podman-compose -f compose/powa_remote_mode.yml up -d
# wait until it's up and got a couple of snapshot
podman-compose -f compose/powa_remote_mode.yml down -t0

podman-compose -f compose/powa_remote_mode.yml up -d

which is what I'm usually using, if that matters.

How did you get the "tpc" and "obvious" databases? I don't have them when using the powa_remote_mode.yml compose file. I thought that it was only avaiable with the demo workload image.

I'm just using some modified compose file. It's simply the powa_remote_mode.yml on which I added the 3 containers used for the dev demo compose file. Apart from the extra containers there are no modifications

pgiraud commented 3 months ago

Could you share the modified compose file just in case it helps reproducing?

rjuju commented 3 months ago
$ diff ../powa_demo.yml compose/powa_remote_mode.yml                                                                                                                                                                                                                                                                                                                                                    [0] 27/07/2024 23:21:45 [AC/DC]
70,112d69
< 
<   pgbench-std-primary:
<     image: powateam/powa-pgbench
<     container_name: powa-dev-pgbench-std-primary
<     restart: on-failure
<     environment:
<       PGHOST: 'remote-primary'
<       PGUSER: 'postgres'
<       PGPORT: 5433
<       BENCH_SCALE_FACTOR: 10
<       BENCH_TIME: 60
<       BENCH_FLAG: '-c1 -j1 -n -R 10'
<     depends_on:
<       remote-primary:
<         condition: service_healthy
< 
<   pgbench-std-standby:
<     image: powateam/powa-pgbench
<     container_name: powa-dev-pgbench-std-standby
<     restart: on-failure
<     environment:
<       PGHOST: 'remote-standby'
<       PGUSER: 'postgres'
<       PGPORT: 5434
<       BENCH_SKIP_INIT: 'true'
<       BENCH_SCALE_FACTOR: 10
<       BENCH_TIME: 120
<       BENCH_FLAG: '-c2 -j2 -S -n -R 10'
<     depends_on:
<       remote-standby:
<         condition: service_healthy
< 
<   pgdemoworload-std-primary:
<     image: powateam/powa-demoworkload
<     container_name: powa-dev-demoworkload-std-primary
<     restart: on-failure
<     environment:
<       PGHOST: 'remote-primary'
<       PGUSER: 'postgres'
<       PGPORT: 5433
<     depends_on:
<       remote-primary:
<         condition: service_healthy
pgiraud commented 3 months ago

OK, I could reproduce without the need of the extra containers. I have an idea (possibly wrong) about what's happening.

pgiraud commented 3 months ago

In my case, with the default compose file, the collector for the primary server is stopped after the first compose up command (starting from a clean podman environment).

At this point, the overview page for the primary server (http://localhost:8888/server/1/overview/) doesn't show anything (empty components) and the grid for the databases is kind of broken: there are more colum groups than columns.

Screenshot from 2024-07-27 18-25-35

In the web console, the json response for this page looks like:

                {
                    "server": "1",
                    "title": "Details for all databases",
                    "metrics": [
                        "by_database.calls",
                        "by_database.runtime",
                        "by_database.avg_runtime",
                        "by_database.shared_blks_read",
                        "by_database.shared_blks_hit",
                        "by_database.shared_blks_dirtied",
                        "by_database.shared_blks_written",
                        "by_database.temp_blks_read",
                        "by_database.temp_blks_written",
                        "by_database.io_time"
                    ],
                    "columns": [
                        {
                            "name": "datname",
                            "label": "Database",
                            "url_attr": "url"
                        }
                    ],
                    "toprow": [
                        {},
                        {},
                        {
                            "name": "Execution",
                            "colspan": 3
                        },
                        {
                            "name": "Blocks",
                            "colspan": 4
                        },
                        {
                            "name": "Temp blocks",
                            "colspan": 2
                        },
                        {
                            "name": "I/O"
                        },
                        {
                            "name": "WAL",
                            "colspan": 3
                        },
                        {
                            "name": "JIT",
                            "colspan": 2
                        }
                    ],
                    "type": "grid"
                }

The problem seems to be that the toprow column groups don't match the columns (metrics). We actually don't take the metrics removal into account when we compute the toprow.

On the UI part, there might be a problem too. If the collector is manually reloaded (using the Actions menu on the top bar), the grid layout stays broken until the user reloads the whole page (F5 or reload button in the browser) or navigates in the pages (for example go back to list of servers and then choose primary server again).

Can you confirm that refreshing the page or navigating in PoWA fixes the glitch?

For the record, when the compose file is just run, before the collector is reloaded, no version is available for the extensions even though they are marked as avaiable, installed and sampled.

Screenshot from 2024-07-27 18-55-04

The versions are shown right after the collector reload.

Screenshot from 2024-07-27 18-57-56

pgiraud commented 3 months ago

In database.py we have code that distinguishes different versions of pgss for the toprow. In server.py, we don't.

rjuju commented 3 months ago

Can you confirm that refreshing the page or navigating in PoWA fixes the glitch?

It doesn't. What works is indeed to manually reload the coordinator and then refreshing the page.

I guess that when podman fetches the powa-web image, it's done before starting the collector which means that by the time the collector stops the remote servers are already up and running. I'm also unsure why the depends_on conditions are not applied.

Anyway, good catch! I thought that everything should have been working since the snapshots are happening. I guess that there is also a bug in the collector, I could teach the collector to update the version are the first successful snapshot or something like that, on top of fixing server.py.

rjuju commented 3 months ago

I fixed the toprow headers in https://github.com/powa-team/powa-web/commit/c8d480cdedf851e1e8eba72d8d41de27b019ee1c. The code for that in database.py was also broken since the planning time metric has been added, but went unnoticed as it didn't mess up the grid too much.

pgiraud commented 3 months ago

Is there anything left to be done?

rjuju commented 3 months ago

I think it's all good now. I will double check just in case. thanks for the timely investigation!