ydb-platform / nbs

Network Block & File Store
Apache License 2.0
58 stars 21 forks source link

[Filestore] Request cancellation upon filestore-vhost shutdown should not increment Errors/Fatal counter #995

Closed qkrorlqr closed 3 months ago

qkrorlqr commented 6 months ago

Upon SIGTERM control flow gets here https://github.com/ydb-platform/nbs/blob/82b6f77dde0aa32af82b1cc077806ec10dc5d3c5/cloud/filestore/libs/vfs_fuse/loop.cpp#L538 then here https://github.com/ydb-platform/nbs/blob/82b6f77dde0aa32af82b1cc077806ec10dc5d3c5/cloud/filestore/libs/vfs_fuse/loop.cpp#L58 and then here https://github.com/ydb-platform/nbs/blob/82b6f77dde0aa32af82b1cc077806ec10dc5d3c5/cloud/filestore/libs/vfs_fuse/vhost/fuse_virtio.c#L349

EINTR is not fatal - it should be retried by the code running inside the guest. But the Errors/Fatal counter gets incremented which triggers alerts. We should not increment this counter upon request cancellation.

Apr 17 16:52:45 hostname NFS_VHOST[327528]: 2024-04-17T16:52:45.694916Z :NFS_FUSE ERROR: cloud/filestore/libs/diagnostics/request_stats.cpp:125: DescribeData #23115694 [f:dp71plqmi27tsmk48o1d][c:dp7ae7vcu9ae46ptqcne] RESPONSE request failed(total_time: 123.436ms, execution_time: 123.436ms, predicted_postponed_time: 0, postponed_time: 0, backoff_time: 0, size: 128.00 KiB, error: E_CANCELLED Driver is stopping)

qkrorlqr commented 6 months ago

The fact that we replace request type during request execution is also a problem. We don't see ReadData errors in some other sensors due to this problem since DescribeData doesn't have the full set of sensors.

qkrorlqr commented 3 months ago

This issue was resolved during the work on some other issues