ydb-platform / ydb-go-sdk

Pure Go native and database/sql driver for YDB
https://ydb.tech
Apache License 2.0
141 stars 76 forks source link

bug: Freeze after context cancellation in ydb-go-sdk. #1025

Closed jkuradobery closed 9 months ago

jkuradobery commented 9 months ago

Bug Report

YDB GO SDK version:

3.54.3

Environment Linux, Ubuntu 22.04. Linux ubuntu 5.15.0-91-generic #101-Ubuntu SMP.

Current behavior:

After the context cancellation on the ydb request execution, the next requests are being timed out. There are timeout message on the ydb server side.

2024-01-31T10:08:13.614Z  DEBUG  persistence/ydb.go:469  session execution failed: query 
    --!syntax_v1
    pragma TablePathPrefix = "/Root/disk_manager/storage_ydb_test/TestShallowCopySnapshotWithRandomFailure/store_chunks_in_s3";
    declare $shard_id as Uint64;
    declare $chunk_id as Utf8;
    declare $referer as Utf8;

    $existing = (
      select chunk_id
      from chunk_blobs
      where shard_id = $shard_id and
        chunk_id = $chunk_id and
        referer = $referer
    );

    $to_update = (
      select
        shard_id,
        chunk_id,
        referer,
        refcnt + cast(1 as Uint32) as refcnt,
      from chunk_blobs
      where shard_id = $shard_id and
        chunk_id = $chunk_id and
        referer = "" and
        $chunk_id not in $existing
    );

    update chunk_blobs
    on select * from $to_update;

    upsert into chunk_blobs (shard_id, chunk_id, referer)
    values ($shard_id, $chunk_id, $referer);
  : connError{node_id:1,address:'bh2jdvmrf76t33p8hd9p.auto.internal:29301'}: operation/TIMEOUT (code = 400090, address = bh2jdvmrf76t33p8hd9p.auto.internal:29301, issues = [{'Query did not complete within specified timeout, session id ydb://session/3?

Expected behavior:

YDB server continues to work properly and not freeze requests.

Steps to reproduce:

https://github.com/ydb-platform/nbs/blob/main/cloud/disk_manager/internal/pkg/dataplane/snapshot/storage/storage_ydb_test.go TestShallowCopySnapshotWithRandomFailure, run with ya.make Logs and test profile here: server.log profile007.pdf client.log

asmyasnikov commented 9 months ago

@jkuradobery Can you show a stacktrace of deadlocked process?

asmyasnikov commented 9 months ago

Check please context which you put to db.Table().Do(ctx)

jkuradobery commented 9 months ago

Check please context which you put to db.Table().Do(ctx) I have added logging of the context to the test. Here are unique contexts logged during the test.


WARN    persistence/ydb.go:564  Context before ydb do context.Background.WithValue(type logging.loggerKey, val <not Stringer>).WithValue(type ctxlog.ctxKey, val <not Stringer>).WithCancel     {"SYSLOG_IDENTIFIER": "disk-manager"}
WARN    persistence/ydb.go:564  Context before ydb do context.Background.WithValue(type logging.loggerKey, val <not Stringer>).WithValue(type ctxlog.ctxKey, val <not Stringer>).WithCancel.WithCancel  {"SYSLOG_IDENTIFIER": "disk-manager"}
WARN    persistence/ydb.go:564  Context before ydb do context.Background.WithValue(type logging.loggerKey, val <not Stringer>).WithValue(type ctxlog.ctxKey, val <not Stringer>).WithCancel.WithCancel.WithValue(type xcontext.ctxIdempotentKey, val <not Stringer>).WithValue(type xcontext.markRetryCallKey, val <not Stringer>).WithCancel {"SYSLOG_IDENTIFIER": "disk-manager"}
WARN    persistence/ydb.go:564  Context before ydb do context.Background.WithValue(type logging.loggerKey, val <not Stringer>).WithValue(type ctxlog.ctxKey, val <not Stringer>).WithCancel.WithValue(type xcontext.ctxIdempotentKey, val <not Stringer>).WithValue(type xcontext.markRetryCallKey, val <not Stringer>).WithCancel    {"SYSLOG_IDENTIFIER": "disk-manager"}
jkuradobery commented 9 months ago

server.log Uploading more verbose log

jkuradobery commented 9 months ago

ydb version 2.7.0

jkuradobery commented 9 months ago

The problem seems to be solved in the recent ydbd versions.