linuxone-community-cloud / technical-resources

Repository for technical resources
71 stars 46 forks source link

Disk IO stalling #22

Closed nickva closed 1 year ago

nickva commented 1 year ago

We just added a Linux ONE instance to Apache CouchDB's automatic CI worker set and it started running the full CI suite on the main branch alongside x86_64, arm64, ppc64le. s390x seems to experience random disk IO stalls which lead db tests failing.

We created an issue tracking flaky tests on s390x: https://github.com/apache/couchdb/issues/4521

Would be it possible to increase the disk IO or swap out for a faster disk.

A quick fio benchmark showed IO throughput is the lowest on s390x instance amongst all the other architecture CI worker we have:

fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=1m --size=16g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
s390x
  Run status group 0 (all jobs):
  WRITE: bw=18.6MiB/s (19.5MB/s), 18.6MiB/s-18.6MiB/s (19.5MB/s-19.5MB/s), io=3495MiB (3665MB), run=188338-188338msec

PowerPC:
  Run status group 0 (all jobs):
  WRITE: bw=1465MiB/s (1536MB/s), 1465MiB/s-1465MiB/s (1536MB/s-1536MB/s), io=87.7GiB (94.2GB), run=61325-61325msec

x86_64:
  Run status group 0 (all jobs):
  WRITE: bw=46.9MiB/s (49.2MB/s), 46.9MiB/s-46.9MiB/s (49.2MB/s-49.2MB/s), io=3968MiB (4161MB), run=84646-84646msec
nickva commented 1 year ago

Closing this issue as duplicated (opened in the wrong repo).

The real issue seems to be https://github.com/linuxone-community-cloud/tickets/issues/52