Open gy2411 opened 3 weeks ago
Moreover, the errors of parsing protobufs happened in requests to ydb:
disk_manager_stdout_cx60ds6q:2024-11-01T19:21:25.304Z INFO storage/storage_ydb_impl.go:1528 finished updating task with id ab93fb55-4ac7-4bef-bdd1-f96179cd074c {"SYSLO
G_IDENTIFIER": "disk-manager", "COMPONENT": "TASK_RUNNER", "REQUEST_ID": "TestImageServiceCreateImageFromImage_5", "SYSLOG_IDENTIFIER": "disk-manager"}
disk_manager_stdout_cx60ds6q:2024-11-01T19:21:25.311Z WARN persistence/ydb_logger.go:74 failed {"SYSLOG_IDENTIFIER": "disk-manager", "COMPONENT": "TASK_RUNNER", "REQUEST_ID":
"TestImageServiceCreateImageFromImage_5", "SYSLOG_IDENTIFIER": "disk-manager", "COMPONENT": "YDB", "YDB_LOG_NAME": "ydb.table.do", "latency": "5.862879049s", "idempotent": true, "lab
el": "", "error": "Retriable error, IgnoreRetryLimit=false: failed: query \n\t\t--!syntax_v1\n\t\tpragma TablePathPrefix = \"/Root/disk_manager/recipe/tasks\";\n\t\tdeclare $values as
List<Struct<\n\t\tid: Utf8,\n\t\tidempotency_key: Utf8,\n\t\taccount_id: Utf8,\n\t\ttask_type: Utf8,\n\t\tregular: Bool,\n\t\tdescription: Utf8,\n\t\tcreated_at: Timestamp,\n\t\tcrea
ted_by: Utf8,\n\t\tmodified_at: Timestamp,\n\t\tgeneration_id: Uint64,\n\t\tstatus: Int64,\n\t\terror_code: Int64,\n\t\terror_message: Utf8,\n\t\terror_silent: Bool,\n\t\terror_detail
s: String,\n\t\tretriable_error_count: Uint64,\n\t\trequest: String,\n\t\tstate: String,\n\t\tmetadata: String,\n\t\tdependencies: String,\n\t\tchanged_state_at: Timestamp,\n\t\tended
_at: Timestamp,\n\t\tlast_host: Utf8,\n\t\tlast_runner: Utf8,\n\t\tzone_id: Utf8,\n\t\testimated_time: Timestamp,\n\t\tdependants: String,\n\t\tpanic_count: Uint64>>;\n\n\t\tupsert in
to tasks\n\t\tselect *\n\t\tfrom AS_TABLE($values)\n\t: connError{node_id:1,address:'dp774es3d0144t8q93pk.auto.internal:9458'}: transport/Internal (code = 13, source error = \"rpc err
or: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8\", address: \"dp774es3d0144t8q93pk.auto.internal:9458\", traceID: \"5430dda3-d03e-4a54-941
6-5645a835ff46\") at `github.com/ydb-platform/ydb-go-sdk/v3/internal/conn.(*conn).Invoke(conn.go:347)` at `github.com/ydb-platform/ydb-go-sdk/v3/internal/balancer.(*Balancer).wrapCall
(balancer.go:351)` at `github.com/ydb-platform/ydb-go-sdk/v3/internal/table.(*session).executeDataQuery(session.go:795)` at `github.com/ydb-platform/ydb-go-sdk/v3/internal/table.(*ses
sion).Execute(session.go:748)` at `github.com/ydb-platform/ydb-go-sdk/v3/internal/table.(*transaction).Execute(transaction.go:80)` at `github.com/ydb-platform/ydb-go-sdk/v3/internal/t
able.do.func1(retry.go:53)`", "retryable": true, "code": 13, "deleteSession": true, "version": "3.54.3"}
The reason of this is error_message field of task state. Here we write request in the message, but request in not a valid utf-8 string.
Solution proposal:
1) DeleteImage task should finish succesfully immediately if it can not find the image in the database here.
2) Write request as base64 in error messages. This will also help to print request in logs in the format that is convenient for parsing.
The fix was merged https://github.com/ydb-platform/nbs/pull/2497
Also need to add test that reproduces the bug more regularly. Work in progress
The TestImageServiceCreateImageFromImage nemesis test failed in the pr https://github.com/ydb-platform/nbs/pull/2294
The root cause of the failure are conflicst of requests of tasks with the same id:
The scenario was the following: