Question on tracking hxestorage progress

dougmill-ibm commented 7 years ago

I am trying to diagnose a problem that occurs at a specific point in HTX mdt.io. It always appears at the same point relative to starting HTX mdt.io, approximately 11 hours into the test. It seems to be about when "cycle count" goes from "0" to "1". The problem persists for about 2 hours, then subsides until the start of the next cycle (cycle count goes to "2" - approximately 9 hours after going to "1"). This pattern then repeats indefinitely throughout the testing (i.e. a couple hours of problems every ~9 hours).

What I am trying to determine is just what sorts of I/O are being performed during that time. How do I get HTX to tell me what tests are being run, or how do I get it to log more information on what tests are being run?

preeti-dhir commented 7 years ago

hxestorage exerciser dumps the info of last apprx. 1500 IOs that happened on any disk device at below location on test system: /tmp/htx/hxestorage//IO_details_dump.log. Info is as below: thread_id: rule_5_1 - time: 0x3c63254e211c7, oper: wrc, cur_oper: read, blkno: 0x1afbddb8, num_blks: 227, rbuf addr: 0x110382df0 Time: when the IO started. oper of be performed: WRC (i.e. write/read/compare) cure_oper: current operation going on blkno: blkno where IO started. num_blks: transfer size in terms of blocks rbuf_addr: address of buffer where to do read.

Hope, this will be helpful to you.

To explain a bit about the exerciser, hxestorage uses default.hdd or default.ssd rulefile in mdt.io. These rulefiles have multiple stanzas which act as input to the exerciser. Each stanza defined here is one testcase. During each stanza run, multiple threads will be doing IO simultaneously on the disk. In 1 cycle count, hxestorage exerciser runs all these stanzas i.e. completes one pass of the rulefile, increments the cycle count and start from beginning again. Each testcase will have its own inputs to be given to the exerciser. Since problem is seen everytime around when cycle count changes. mostly it will be in stanza 3 or 4. You can see on screen "Curr Stanza" around the time you see the problem,

If you can let me know what kind of issue are you seeing and provide me system login details. I can provide may b more details related to it.

dougmill-ibm commented 7 years ago

What I need is to be able to relate behavior of the system, bounded by some timestamp values, to what HTX was doing during that time. I guess I could dig through the rules and get an idea of what is going on, but I do not have any log that shows what rules were being run at a given time. Is there any sort of file that contains this information, for example a timestamp (human readable) and indication of the beginning/ending of a step?

Am I correct in assuming that these rule files will help explain what each step does?

Also, the "time:" field in the IO_details_dump.log file seems to be the value of the PPC timebase register, and is difficult (impossible) to relate back to wall-clock time. Also, since that file only shows the last 1500 I/Os it doesn't help for a long running test. I typically see around 1 million I/Os per hour, per disk.

open-power / HTX

Question on tracking hxestorage progress #114