microsoft / StorScore

A test framework to evaluate SSDs and HDDs
http://aka.ms/storscore
MIT License
81 stars 34 forks source link

Log errors from bg_exec #22

Open lauracaulfield opened 8 years ago

lauracaulfield commented 8 years ago

Currently, there's no record of errors occurring in the process executed in the background (through "bg_exec" in the recipes). This can be a problem if you run one of the targeted tests designed to check that the performance of a workload is unchanged by background activity (like write_impact or smart_check). If the bg process silently errors, the fg process will have the same performance and the drive could incorrectly pass the test.

One solution is to capture the std err from the bg process (see line 830 of Util.pm). A better option is to fail out the whole recipe if the BG process errors.

A testing scenario (where I've seen the bg process error) is running write_impact in demo mode on a slow thumb drive on win10 client. Something about this combination causes the bg process to target a drive letter that no longer exists.

lauracaulfield commented 8 years ago

It's possible the absence of the "background-..." file in the results directory means the background process ended in error. This matches the behavior I'm observing. But that behavior doesn't match the code on line 830 of Util.pm...

marksantaniello commented 8 years ago

Another approach to consider: you could try to improve bg_killall so that it checks to see if any of the bg_exec'd processes have already died.

You might need to make execute_task return the whole $proc object instead of $proc->GetProcessID() (in the case where $background == true). The $bg_processes variable in Recipe.pm could hold those proc objects instead of pids. Then you can loop over them and try to grab their exit codes, much like we do in the non-background case (but instead of INFINITE you could use 0, maybe): https://github.com/Microsoft/StorScore/blob/master/lib/Util.pm#L881-L882

If none of them have already died, then you can kill them just as we do today.

Before you do this, you'll want to audit all the uses of bg_exec( background => 1 ) to ensure that returning the $proc object in liu of the $pid is OK.