Closed gurugio closed 6 years ago
@gurugio Might be linked to your message:
modprobe: FATAL: Module scsi_debug is in use.
So maybe you made stacked device or a filesystem atop the device that scsi_debug made thus locking it open...
I checked a reference counter of the scsi_debug module. The reference counter was 0 just after it was loaded. So I think there is nothing else to use the scsi_debug module. The reference counter was 1 when modprobe tried to unload the scsi_debug module. And it became 0 at the end of the test.
After two xfs_io executed, the reference counter became 2. Is there a possibility that xfs_io was not finished?
gohkim@ws00837:~/work/tools/blktests$ sudo modprobe -r scsi_debug
gohkim@ws00837:~/work/tools/blktests$ git diff tests/block/009
diff --git a/tests/block/009 b/tests/block/009
index 69d0be7..9b64500 100755
--- a/tests/block/009
+++ b/tests/block/009
@@ -38,19 +38,27 @@ test() {
return 1
fi
+ lsmod | grep scsi
+
local dev="/dev/${SCSI_DEBUG_DEVICES[0]}"
# Initialize data with known pattern
xfs_io -c "pwrite -S 0xaa -b 2M 0 32M -w" -d "$dev" >>"$FULL"
+ lsmod | grep scsi
# Fill page cache with dirty data
xfs_io -c "pwrite -S 0xbb -b 2M 2M 10M" -f "$dev" >>"$FULL"
+ lsmod | grep scsi
blkdiscard "$dev"
+
# After device was fully discarded, read sould return all zeroes
dd if="$dev" bs=4k 2>>"$FULL" | hexdump
dd if="$dev" bs=4M iflag=direct 2>>"$FULL" | hexdump
+ lsmod | grep scsi
_exit_scsi_debug
+ lsmod | grep scsi
echo "Test complete"
+ lsmod | grep scsi
}
gohkim@ws00837:~/work/tools/blktests$ cat results/nodev/block/009.out.bad
Running block/009
scsi_debug 77824 0
scsi_debug 77824 1
scsi_debug 77824 2
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
2000000
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
2000000
scsi_debug 77824 1
modprobe: FATAL: Module scsi_debug is in use.
scsi_debug 77824 1
Test complete
scsi_debug 77824 0
I added a "-d" option to pwrite for direct IO and had no data conflict.
gohkim@ws00837:~/work/tools/blktests$ git diff
diff --git a/tests/block/009 b/tests/block/009
index 69d0be7..c72627f 100755
--- a/tests/block/009
+++ b/tests/block/009
@@ -40,12 +40,13 @@ test() {
local dev="/dev/${SCSI_DEBUG_DEVICES[0]}"
# Initialize data with known pattern
- xfs_io -c "pwrite -S 0xaa -b 2M 0 32M -w" -d "$dev" >>"$FULL"
+ xfs_io -c "pwrite -S 0xaa -b 2M 0 32M -d -w" -d "$dev" >>"$FULL"
# Fill page cache with dirty data
- xfs_io -c "pwrite -S 0xbb -b 2M 2M 10M" -f "$dev" >>"$FULL"
+ xfs_io -c "pwrite -S 0xbb -b 2M 2M 10M -d" -f "$dev" >>"$FULL"
blkdiscard "$dev"
+
# After device was fully discarded, read sould return all zeroes
dd if="$dev" bs=4k 2>>"$FULL" | hexdump
dd if="$dev" bs=4M iflag=direct 2>>"$FULL" | hexdump
gohkim@ws00837:~/work/tools/blktests$ sudo ./check block/009
block/009 (check page-cache coherency after BLKDISCARD) [failed]
runtime 1,080s ... 1,074s
--- tests/block/009.out 2018-04-23 17:52:00.222822301 +0200
+++ results/nodev/block/009.out.bad 2018-04-24 10:56:02.220782318 +0200
@@ -5,4 +5,5 @@
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
2000000
+modprobe: FATAL: Module scsi_debug is in use.
Test complete
I checked /sys/block/sdc/stat and found that data was written to the disk after blkdiscard.
gohkim@ws00837:~/work/tools/blktests$ git diff
diff --git a/tests/block/009 b/tests/block/009
index 69d0be7..fbf91f4 100755
--- a/tests/block/009
+++ b/tests/block/009
@@ -44,13 +44,17 @@ test() {
# Fill page cache with dirty data
xfs_io -c "pwrite -S 0xbb -b 2M 2M 10M" -f "$dev" >>"$FULL"
-
+ cat /sys/block/sdc/stat
blkdiscard "$dev"
+ cat /sys/block/sdc/stat
# After device was fully discarded, read sould return all zeroes
dd if="$dev" bs=4k 2>>"$FULL" | hexdump
dd if="$dev" bs=4M iflag=direct 2>>"$FULL" | hexdump
+ cat /sys/block/sdc/stat
_exit_scsi_debug
echo "Test complete"
+ cat /sys/block/sdc/stat
+
}
gohkim@ws00837:~/work/tools/blktests$ cat results/nodev/block/009.out.bad
Running block/009
33 0 704 132 65 0 65536 264 0 188 392
34 0 712 136 67 0 131072 280 1 196 408 ---> after blkdiscard
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0200000 bbbb bbbb bbbb bbbb bbbb bbbb bbbb bbbb
*
0c00000 0000 0000 0000 0000 0000 0000 0000 0000
*
2000000
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0200000 bbbb bbbb bbbb bbbb bbbb bbbb bbbb bbbb
*
0c00000 0000 0000 0000 0000 0000 0000 0000 0000
*
2000000
308 0 111296 1244 87 2540 151552 360 0 640 1596 ---> after dd
modprobe: FATAL: Module scsi_debug is in use.
Test complete
308 0 111296 1244 87 2540 151552 360 0 640 1596
@gurugio what kernel version is this on? It looks like the fix that this test is testing went in for 4.15.
I've seen the modprobe issue but haven't figured out who is still holding the device open. But let's focus on the test failure first.
I am using Ubuntu and the standard kernel as following.
gohkim@ws00837:~/work$ uname -a
Linux ws00837 4.13.0-16-generic #19-Ubuntu SMP Wed Oct 11 18:35:14 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
gohkim@ws00837:~/work$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=17.10
DISTRIB_CODENAME=artful
DISTRIB_DESCRIPTION="Ubuntu 17.10"
My colleague found a patch related to discard block device in kernel log. Should I apply that patch?
351499a1 block: Invalidate cache on discard v2
Yes, that is the patch I'm talking about. All of the regression tests in blktests mention what patch they depend on, see https://github.com/osandov/blktests/blob/master/tests/block/009.
You will also need 0bd1ed4860d0 ("block: pass inclusive 'lend' parameter to truncate_inode_pages_range").
OK, thank you. Then could you inform me what to do for "modprobe: FATAL: Module scsi_debug is in use." error?
@gurugio, I just pushed a commit which should fix this.
Thanks!!
Hi,
The block/009 test case failed as below. I tested it on two machines and both resulted the same failure.
If I ran blkdiscard after the test, the sdg device was cleared.
I hope you could guess the reason. Please inform me if you need more information.