woodpecker-ci / woodpecker

Woodpecker is a simple, yet powerful CI/CD engine with great extensibility.
https://woodpecker-ci.org
Apache License 2.0
4.2k stars 364 forks source link

Problems with DB migration `convert-to-new-pipeline-error-format` #2737

Closed xoxys closed 11 months ago

xoxys commented 11 months ago

Component

server

Describe the bug

The server startup stuck on convert-to-new-pipeline-error-format migration. Even after 120m the migration has not finished.

{"level":"info","time":"2023-11-03T12:05:16Z","caller":"/build/cmd/common/logger.go:100","message":"LogLevel = debug"}
{"level":"info","time":"2023-11-03T12:05:36Z","caller":"/build/server/store/datastore/migration/migration.go:245","message":"Migration 'convert-to-new-pipeline-error-format' is still running, please be patient"}
{"level":"info","time":"2023-11-03T12:05:56Z","caller":"/build/server/store/datastore/migration/migration.go:245","message":"Migration 'convert-to-new-pipeline-error-format' is still running, please be patient"}

After enabling the XORM log in woodpecker the log shows the executed SQL commands as well:

Server Logs ```json { "level": "info", "time": "2023-11-03T22:40:20Z", "caller": "/woodpecker/src/github.com/woodpecker-ci/woodpecker/cmd/common/logger.go:100", "message": "LogLevel = debug" } { "level": "info", "component": "xorm", "time": "2023-11-03T22:40:20Z", "caller": "/woodpecker/src/github.com/woodpecker-ci/woodpecker/server/store/datastore/xorm.go:77", "message": "[SQL] SELECT tablename FROM pg_tables WHERE schemaname = $1 [public] - 30.308275ms" } { "level": "info", "component": "xorm", "time": "2023-11-03T22:40:20Z", "caller": "/woodpecker/src/github.com/woodpecker-ci/woodpecker/server/store/datastore/xorm.go:77", "message": "[SQL] SELECT column_name, column_default, is_nullable, data_type, character_maximum_length, description,\n CASE WHEN p.contype = 'p' THEN true ELSE false END AS primarykey,\n CASE WHEN p.contype = 'u' THEN true ELSE false END AS uniquekey\nFROM pg_attribute f\n JOIN pg_class c ON c.oid = f.attrelid JOIN pg_type t ON t.oid = f.atttypid\n LEFT JOIN pg_attrdef d ON d.adrelid = c.oid AND d.adnum = f.attnum\n LEFT JOIN pg_description de ON f.attrelid=de.objoid AND f.attnum=de.objsubid\n LEFT JOIN pg_namespace n ON n.oid = c.relnamespace\n LEFT JOIN pg_constraint p ON p.conrelid = c.oid AND f.attnum = ANY (p.conkey)\n LEFT JOIN pg_class AS g ON p.confrelid = g.oid\n LEFT JOIN INFORMATION_SCHEMA.COLUMNS s ON s.column_name=f.attname AND c.relname=s.table_name\nWHERE n.nspname= s.table_schema AND c.relkind = 'r' AND c.relname = $1 AND s.table_schema = $2 AND f.attnum > 0 ORDER BY f.attnum; [migrations public] - 28.791397ms" } { "level": "info", "component": "xorm", "time": "2023-11-03T22:40:20Z", "caller": "/woodpecker/src/github.com/woodpecker-ci/woodpecker/server/store/datastore/xorm.go:77", "message": "[SQL] SELECT indexname, indexdef FROM pg_indexes WHERE tablename=$1 AND schemaname=$2 [migrations public] - 11.22641ms" } { "level": "info", "component": "xorm", "time": "2023-11-03T22:40:20Z", "caller": "/woodpecker/src/github.com/woodpecker-ci/woodpecker/server/store/datastore/xorm.go:77", "message": "[SQL] BEGIN TRANSACTION [] - 603.478µs" } { "level": "info", "component": "xorm", "time": "2023-11-03T22:40:20Z", "caller": "/woodpecker/src/github.com/woodpecker-ci/woodpecker/server/store/datastore/xorm.go:77", "message": "[SQL] SELECT count(*) FROM \"migrations\" [] - 1.181009ms" } { "level": "info", "component": "xorm", "time": "2023-11-03T22:40:20Z", "caller": "/woodpecker/src/github.com/woodpecker-ci/woodpecker/server/store/datastore/xorm.go:77", "message": "[SQL] COMMIT [] - 1.832027ms" } { "level": "info", "component": "xorm", "time": "2023-11-03T22:40:20Z", "caller": "/woodpecker/src/github.com/woodpecker-ci/woodpecker/server/store/datastore/xorm.go:77", "message": "[SQL] SELECT \"name\" FROM \"migrations\" [] - 1.039455ms" } { "level": "info", "component": "xorm", "time": "2023-11-03T22:40:20Z", "caller": "/woodpecker/src/github.com/woodpecker-ci/woodpecker/server/store/datastore/xorm.go:77", "message": "[SQL] BEGIN TRANSACTION [] - 928.707µs" } { "level": "info", "component": "xorm", "time": "2023-11-03T22:40:20Z", "caller": "/woodpecker/src/github.com/woodpecker-ci/woodpecker/server/store/datastore/xorm.go:77", "message": "[SQL] SELECT tablename FROM pg_tables WHERE schemaname = $1 [public] - 2.278934ms" } { "level": "info", "component": "xorm", "time": "2023-11-03T22:40:20Z", "caller": "/woodpecker/src/github.com/woodpecker-ci/woodpecker/server/store/datastore/xorm.go:77", "message": "[SQL] SELECT column_name, column_default, is_nullable, data_type, character_maximum_length, description,\n CASE WHEN p.contype = 'p' THEN true ELSE false END AS primarykey,\n CASE WHEN p.contype = 'u' THEN true ELSE false END AS uniquekey\nFROM pg_attribute f\n JOIN pg_class c ON c.oid = f.attrelid JOIN pg_type t ON t.oid = f.atttypid\n LEFT JOIN pg_attrdef d ON d.adrelid = c.oid AND d.adnum = f.attnum\n LEFT JOIN pg_description de ON f.attrelid=de.objoid AND f.attnum=de.objsubid\n LEFT JOIN pg_namespace n ON n.oid = c.relnamespace\n LEFT JOIN pg_constraint p ON p.conrelid = c.oid AND f.attnum = ANY (p.conkey)\n LEFT JOIN pg_class AS g ON p.confrelid = g.oid\n LEFT JOIN INFORMATION_SCHEMA.COLUMNS s ON s.column_name=f.attname AND c.relname=s.table_name\nWHERE n.nspname= s.table_schema AND c.relkind = 'r' AND c.relname = $1 AND s.table_schema = $2 AND f.attnum > 0 ORDER BY f.attnum; [pipelines public] - 74.527942ms" } { "level": "info", "component": "xorm", "time": "2023-11-03T22:40:20Z", "caller": "/woodpecker/src/github.com/woodpecker-ci/woodpecker/server/store/datastore/xorm.go:77", "message": "[SQL] SELECT indexname, indexdef FROM pg_indexes WHERE tablename=$1 AND schemaname=$2 [pipelines public] - 5.04695ms" } { "level": "info", "component": "xorm", "time": "2023-11-03T22:40:20Z", "caller": "/woodpecker/src/github.com/woodpecker-ci/woodpecker/server/store/datastore/xorm.go:77", "message": "[SQL] DROP INDEX \"public\".\"IDX_pipelines_pipeline_author\" [] - 6.259148ms" } { "level": "info", "component": "xorm", "time": "2023-11-03T22:40:20Z", "caller": "/woodpecker/src/github.com/woodpecker-ci/woodpecker/server/store/datastore/xorm.go:77", "message": "[SQL] DROP INDEX \"public\".\"IDX_pipelines_pipeline_repo_id\" [] - 1.170448ms" } { "level": "info", "component": "xorm", "time": "2023-11-03T22:40:20Z", "caller": "/woodpecker/src/github.com/woodpecker-ci/woodpecker/server/store/datastore/xorm.go:77", "message": "[SQL] DROP INDEX \"public\".\"IDX_pipelines_pipeline_status\" [] - 1.039996ms" } { "level": "info", "component": "xorm", "time": "2023-11-03T22:40:20Z", "caller": "/woodpecker/src/github.com/woodpecker-ci/woodpecker/server/store/datastore/xorm.go:77", "message": "[SQL] DROP INDEX \"public\".\"UQE_pipelines_s\" [] - 1.358872ms" } { "level": "info", "component": "xorm", "time": "2023-11-03T22:40:20Z", "caller": "/woodpecker/src/github.com/woodpecker-ci/woodpecker/server/store/datastore/xorm.go:77", "message": "[SQL] SELECT tablename FROM pg_tables WHERE schemaname = $1 [public] - 1.928387ms" } { "level": "info", "time": "2023-11-03T22:40:40Z", "caller": "/woodpecker/src/github.com/woodpecker-ci/woodpecker/server/store/datastore/migration/migration.go:245", "message": "Migration 'convert-to-new-pipeline-error-format' is still running, please be patient" } { "level": "info", "time": "2023-11-03T22:41:00Z", "caller": "/woodpecker/src/github.com/woodpecker-ci/woodpecker/server/store/datastore/migration/migration.go:245", "message": "Migration 'convert-to-new-pipeline-error-format' is still running, please be patient" } ```

I was also looking into pg_stat_activity to find long-running queries:

------------+------------------------------------------------------------------------------------------------------------------------------------
       3914 | SELECT column_name, column_default, is_nullable, data_type, character_maximum_length, description,                                +
            |     CASE WHEN p.contype = 'p' THEN true ELSE false END AS primarykey,                                                             +
            |     CASE WHEN p.contype = 'u' THEN true ELSE false END AS uniquekey                                                               +
            | FROM pg_attribute f                                                                                                               +
            |     JOIN pg_class c ON c.oid = f.attrelid JOIN pg_type t ON t.oid = f.atttypid                                                    +
            |     LEFT JOIN pg_attrdef d ON d.adrelid = c.oid AND d.adnum = f.attnum                                                            +
            |     LEFT JOIN pg_description de ON f.attrelid=de.objoid AND f.attnum=de.objsubid                                                  +
            |     LEFT JOIN pg_namespace n ON n.oid = c.relnamespace                                                                            +
            |     LEFT JOIN pg_constraint p ON p.conrelid = c.oid AND f.attnum = ANY (p.conkey)                                                 +
            |     LEFT JOIN pg_class AS g ON p.confrelid = g.oid                                                                                +
            |     LEFT JOIN INFORMATION_SCHEMA.COLUMNS s ON s.column_name=f.attname AND c.relname=s.table_name                                  +
            | WHERE n.nspname= s.table_schema AND c.relkind = 'r' AND c.relname = $1 AND s.table_schema = $2 AND f.attnum > 0 ORDER BY f.attnum;
       age       | state  | process_id |                                                            active_query                                                            
-----------------+--------+------------+------------------------------------------------------------------------------------------------------------------------------------
 00:17:46.638525 | active |       3914 | SELECT column_name, column_default, is_nullable, data_type, character_maximum_length, description,                                +
                 |        |            |     CASE WHEN p.contype = 'p' THEN true ELSE false END AS primarykey,                                                             +
                 |        |            |     CASE WHEN p.contype = 'u' THEN true ELSE false END AS uniquekey                                                               +
                 |        |            | FROM pg_attribute f                                                                                                               +
                 |        |            |     JOIN pg_class c ON c.oid = f.attrelid JOIN pg_type t ON t.oid = f.atttypid                                                    +

I've not found any errors in the general log of the postgres server.

System Info

`next-dffdcdaac0`

Additional context

No response

Validations

anbraten commented 11 months ago

Could you check the amount of pipeline you have in the db?

xoxys commented 11 months ago

Sure. IMO not that much and this amount should not cause any issue.

woodpecker_server=# select count(*) from pipelines;
 count 
-------
  1888
(1 row)
anbraten commented 11 months ago

Okay, that doesn't sound to crazy TBH. Anyways just had the "genius" idea to directly filter for pipelines which actually had an error. 🙈

zc-devs commented 11 months ago

Could you check the amount of pipeline you have in the db?

I have 49.

Also there is no load in that namespace during migration:


> kubectl top pods -n scm-cicd
NAME                                CPU(cores)   MEMORY(bytes)
gitea-0                             2m           216Mi
postgres-0                          2m           60Mi
woodpecker-server-7f5c58496-4f52l   1m           20Mi
anbraten commented 11 months ago

I have 49.

Cute

zc-devs commented 11 months ago

During migration table is locked (can't do select from it):

# select relation::regclass, * from pg_locks where not granted;
 relation  | locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction |
  pid  |      mode       | granted | fastpath |           waitstart
-----------+----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+
-------+-----------------+---------+----------+-------------------------------
 pipelines | relation |    34572 |    34587 |      |       |            |               |         |       |          | 6/33               |
 67537 | AccessShareLock | f       | f        | 2023-11-04 13:22:07.358858+00
anbraten commented 11 months ago

Could you test using this server image woodpeckerci/woodpecker-server:pull_2735 if the adjusted migration works for you?

zc-devs commented 11 months ago

migration works for you?

Not for me :( All is the same. woodpecker-server.log

pat-s commented 11 months ago

Also facing the same issues that @zc-devs reported using pull_2735. NB: I have 1 pipeline in this instance.

anbraten commented 11 months ago

Found the problematic line. Its somehow related to .Sync and postgres.

pat-s commented 11 months ago

Can confirm it works now with https://github.com/woodpecker-ci/woodpecker/pull/2735/commits/7c2f4c45d9f0a24596caa47810f77dad64198460 🎉