optimize preconditioning

FlorianHeigl commented 4 years ago

Hi,

I have been using tkperf a bit over the last days. _(I started with fio_visualizer and fio_generateplots and fio2gnuplot like you apparently did, and got pretty annoyed with all the little render issues etc. I can only admire that you were able to build something that simply writes out useful data with good quality)

With large SSDs the preconditioning is quite a time-sink. I had some little issues where the tests later aborted and it wasn't better by all the waiting while for each new run I had to again go through the preconditioning period. Hacking it out of the code would have been an option but of course then i'd also have had a set of overly optimistic results.

Watching this a few times I saw how the SSD started out at ~2200 MB/s write, and then after 30 or more minutes dropped down to a stable value in the 1600-1700MB/s range. with that I came to the following idea:

Instead of running one full-size fio task, maybe it would be an idea to have a logic that

splits the SSD size into 10 or 100 slices do a quick fio run that will easily be handled by the ssd, and check the write_bw of that one.

start iterating
   run fio tasks that cover one slice 
   check the perfomance
   if it's lower than the initial quick shot
     break

That way we could still be sure the SSD has been sufficiently busy but don't need to wait through for ages.

edit: I started wondering if the number of fio jobs commandline flag is the problem - it seems like with 8 fio jobs we run 8 preconditioning rounds wrapped into one. Is that possible? For measure, a test run of a 3.2TB SSD turned into roughly 75TB of disk writes. I think that's a bit too wasteful.

gschoenberger commented 4 years ago

Maybe a quick work around but not worse? Currently we have the "--runtime" option to limit the runtime of the tests carried out. Should we add a "--precondruntime" to limit the runtime of a precondition round? Then one can inspect the optimum precondition runtime for a given device and limit it on subsequent runs.

Georg

FlorianHeigl commented 4 years ago

Hi Georg,

it is probably a good enough thing to start with, yes.

Pro

simple, i for example would just set it to 3 hours max. precond in all tests

Con

when I do that that's quite unscientific ;-) As controllers advance, there might be many odd behaviours that are not seen if the time is set too short
will get in the way with hands-off scenarios (i.e. someone who wants to run this on various cloud envs)

In summary, I think for now this is the best possible solution. Definitely works for me.

thomas-krenn / TKperf

optimize preconditioning #35

Pro

Con