longhorn / longhorn

Cloud-Native distributed storage built on and for Kubernetes
https://longhorn.io
Apache License 2.0
5.97k stars 589 forks source link

[DOC] Increase nvme timeout values for improving the stability of v2 volumes #9370

Open derekbit opened 2 weeks ago

derekbit commented 2 weeks ago

What's the document you plan to update? Why? Please describe

During IO stress testing on v2 volumes, NVMe disks may experience IO timeout errors. The NVMe IO timeout and admin timeout are set to 30 and 60 seconds, respectively, and can be adjusted via /sys/module/nvme_core/parameters/io_timeout and /sys/module/nvme_core/parameters/admin_timeout. These default timeout values may be insufficient under heavy IO loads. Some vendors recommend increasing these timeout values to enhance stability and prevent errors.

We can add a note in the official doc about increasing the timeout values.

Ref:

Additional context

derekbit commented 2 weeks ago

cc @shuo-wu @PhanLe1010 @DamiaSan @c3y1huang @innobead