microsoft / StorScore

A test framework to evaluate SSDs and HDDs
http://aka.ms/storscore
MIT License
79 stars 40 forks source link

DiskSpd returned non-zero errorlevelOut of memory! #47

Open bdenison99 opened 7 years ago

bdenison99 commented 7 years ago

I was trying to run the "recipes\SIT\SIT_SSD_Sanity.rcp" recipe on a 480GB SSD and got the error message in the title when it got to 4K seq 100% read QD 128.

Event Viewer has the following: Faulting application name: DiskSpd.exe, version: 0.0.0.0, time stamp: 0x57449527 Faulting module name: ucrtbase.DLL, version: 10.0.10586.9, time stamp: 0x5642c48d Exception code: 0xc0000409 Fault offset: 0x00000000000698fe Faulting process id: 0xe30 Faulting application start time: 0x01d2bc18470f6f2b Faulting application path: C:\storscore\bin\DiskSpd.exe Faulting module path: C:\Windows\SYSTEM32\ucrtbase.DLL Report Id: ed649a57-280c-11e7-80d1-e41d2dece230 Faulting package full name: Faulting package-relative application ID:

dl2n commented 7 years ago

That’s STATUS_STACK_BUFFER_OVERRUN (c0000409).

If this can be reduced to a reproducible crash with a simple cmdline invocation of DISKSPD, that would be great.

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows 10

From: Bart Denisonmailto:notifications@github.com Sent: Monday, April 24, 2017 9:00 AM To: Microsoft/StorScoremailto:StorScore@noreply.github.com Cc: Subscribedmailto:subscribed@noreply.github.com Subject: [Microsoft/StorScore] DiskSpd returned non-zero errorlevelOut of memory! (#47)

I was trying to run the "recipes\SIT\SIT_SSD_Sanity.rcp" recipe on a 480GB SSD and got the error message in the title when it got to 4K seq 100% read QD 128.

Event Viewer has the following: Faulting application name: DiskSpd.exe, version: 0.0.0.0, time stamp: 0x57449527 Faulting module name: ucrtbase.DLL, version: 10.0.10586.9, time stamp: 0x5642c48d Exception code: 0xc0000409 Fault offset: 0x00000000000698fe Faulting process id: 0xe30 Faulting application start time: 0x01d2bc18470f6f2b Faulting application path: C:\storscore\bin\DiskSpd.exe Faulting module path: C:\Windows\SYSTEM32\ucrtbase.DLL Report Id: ed649a57-280c-11e7-80d1-e41d2dece230 Faulting package full name: Faulting package-relative application ID:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/Microsoft/StorScore/issues/47, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIVpKS9Gwe9-vq1uQGS408-AA-waxF23ks5rzMcjgaJpZM4NGXqZ.

lauracaulfield commented 7 years ago

Dan -- the command line to DiskSpd probably looks something like this:

DiskSpd.exe -si -b4K -t4 -o32 -L -h -d3600 -Z20M,"C:\storscore\entropy\0_pct_comp.bin" E:\testfile.dat

Bart -- can you confirm by looking at the first line of the "test...txt" file in the results directory that corresponds to the test that failed?

bdenison99 commented 7 years ago

We got the command line, but couldn't reproduce the failure. I restarted at the same iteration that was was printed just before the crash and it's back running. It was odd because 3 systems all hit the issue at close to the same time.

lauracaulfield commented 7 years ago

Weird. So, you can't repro it with either StorScore or DiskSpd (directly)? I wonder if your lab got zapped by a cosmic ray exactly then and the memory freaked out. Is there anything you can see in the system logs about memory errors, or anything suspicious in the StorScore telemetry?

bdenison99 commented 7 years ago

DiskSpd.exe -si -b4K -t4 -o32 -a0,2,4,6,8,10,12,14,16,18,20,22,1,3,5,7,9,11,13,15,17,19,21,23 -L -h -d600 -Z20M,"C:\storscore\entropy\0_pct_comp.bin" d:\testfile.dat

Assertion failed: vPerfDone[p].Reserved2 >= vPerfInit[p].Reserved2, file ....\IORequestGenerator\IORequestGenerator.cpp, line 2399

bdenison99 commented 7 years ago

An alpha particle strike would be believable for one system - 3 that are not in line with each other would be asking a bit much.

dl2n commented 7 years ago

Is that the DISKSPD output from the step which corresponded to that Status = c0000409 event thrown into the system event log?

That assert is somewhat more explicable. As you might guess, DISKSPD doesn’t have any business looking at that reserved field – it has meaning, but for reasons lost to history it was decided not to expose it. It’s a field which could very feasibly wrap on a system that’s been sufficiently busy, which is more or less what that assert caught. Doesn’t matter how long the run was: its an absolute counter. You just have to get unlucky enough to have straddled the wrap with the test.

I wonder if the assert failing surfaces with this as the exception code for some odd reason.

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows 10

From: Bart Denisonmailto:notifications@github.com Sent: Monday, April 24, 2017 5:15 PM To: Microsoft/StorScoremailto:StorScore@noreply.github.com Cc: Dan Lovingermailto:dl2n@outlook.com; Commentmailto:comment@noreply.github.com Subject: Re: [Microsoft/StorScore] DiskSpd returned non-zero errorlevelOut of memory! (#47)

DiskSpd.exe -si -b4K -t4 -o32 -a0,2,4,6,8,10,12,14,16,18,20,22,1,3,5,7,9,11,13,15,17,19,21,23 -L -h -d600 -Z20M,"C:\storscore\entropy\0_pct_comp.bin" d:\testfile.dat

Assertion failed: vPerfDone[p].Reserved2 >= vPerfInit[p].Reserved2, file ....\IORequestGenerator\IORequestGenerator.cpp, line 2399

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/Microsoft/StorScore/issues/47#issuecomment-296856124, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIVpKVd_BHeo1sqnsxTZNSRyg2HCPV9tks5rzTszgaJpZM4NGXqZ.

bdenison99 commented 7 years ago

Yup - that was all that was in the latest log file in the results folder.

dl2n commented 7 years ago

OK, well, we should fix it.

Let me test my comfort level on pushing changes down from GitHub. We have a couple code velocity challenges at the moment, and a fairly big payload looming that needs to go the other way.

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows 10

From: Bart Denisonmailto:notifications@github.com Sent: Monday, April 24, 2017 8:42 PM To: Microsoft/StorScoremailto:StorScore@noreply.github.com Cc: Dan Lovingermailto:dl2n@outlook.com; Commentmailto:comment@noreply.github.com Subject: Re: [Microsoft/StorScore] DiskSpd returned non-zero errorlevelOut of memory! (#47)

Yup - that was all that was in the latest log file in the results folder.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/Microsoft/StorScore/issues/47#issuecomment-296890324, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIVpKcdJxclaKMZ8MhDJ8Usc9mRUMzNoks5rzWu3gaJpZM4NGXqZ.

lauracaulfield commented 7 years ago

I'll keep an eye on the DiskSpd GitHub page for pulling the change into StorScore.

bdenison99 commented 7 years ago

We've run into a few more instances of this. Thankfully it would appear that we can delete the logs from the run that failed, start over with the --start_on_step=xxx parameter and pick right up where we left off.