scrub does not end - Githubissues

espux commented 8 months ago

When scrubbing a single drive in a multi drive filesystem the scrub operation appears to never stop. The expected behavior would be that it stops as soon as all data and metadata on the drive have been scrubbed but instead it just continues. I did not have more patience than to let it continue for an hour past expected end time so I do not know if it would stop eventually. Scrub status reported the following when I canceled it.

Status:           running
Duration:         18:10:46
Time left:        21209098:30:09
ETA:              Mon Apr 20 01:47:23 4443
Total to scrub:   13.59TiB
Bytes scrubbed:   14.38TiB  (105.81%)
Rate:             230.41MiB/s
Error summary:    csum=57669
  Corrected:      57669
  Uncorrectable:  0
  Unverified:     0

adam900710 commented 8 months ago

This seems to be caused by the same problem from https://github.com/kdave/btrfs-progs/issues/690.

It's the report part not doing the correct thing (we're using the same values from "bytes to scrub" to report "total to scrub"), the ratio above 100% is caused by the timing difference.

There is a fix submitted, mind to have a try? https://github.com/adam900710/btrfs-progs/commit/be6f1470add5193fcb435e7bbba875dc60a2a2ba

espux commented 8 months ago

This is the same scrub operation but it is not the same problem. I used v6.2.2 and the status looks ok as one can see from this taken just before it gets to the expected end.

Status:           running
Duration:         16:45:06
Time left:        0:15:29
ETA:              Wed Oct 11 14:07:05 2023
Total to scrub:   13.59TiB
Bytes scrubbed:   13.38TiB  (98.48%)
Rate:             232.71MiB/s
Error summary:    csum=57467
  Corrected:      57467
  Uncorrectable:  0
  Unverified:     0

As one can see it report the correct "Total to scrub" but it does not stop when it reaches this value. Here are the numbers from "device usage"

   Device size:            18.19TiB
   Device slack:              0.00B
   Data,RAID1:             13.57TiB
   Metadata,RAID1:         26.00GiB
   Unallocated:             4.60TiB

I also compiled your fix and resumed the scrub and I get this output.

Scrub resumed:    Thu Oct 12 08:47:01 2023
Status:           running
Duration:         18:11:29
Time left:        21210896:15:58
ETA:              Sat Jul  4 17:03:34 4443
Total to scrub:   13.61TiB
Bytes scrubbed:   14.39TiB  (105.75%)
Rate:             230.39MiB/s
Error summary:    csum=57669
  Corrected:      57669
  Uncorrectable:  0
  Unverified:     0

adam900710 commented 8 months ago

OK, this is weird now.

Have you tried to wait more to see if it really finishes, or dead loops forever?

Anyway I'll need to add extra debug output to know what's really going on.

Meanwhile, mind to provide the kernel version for the bug?

espux commented 8 months ago

Kernel 6.4.12-1

I tried for a different drive earlier and it continued past device size.

adam900710 commented 8 months ago

I'll prepare a debug patch soon, as long as you can recompile the kernel (or at least the btrfs module for the same kernel).

espux commented 8 months ago

I can't locate a source package for that kernel version (OpenSuse). I see the latest available package is 6.5.6-1.1, I can download and build that.

adam900710 commented 8 months ago

Sorry for the late reply, here is the debug diff: diff.txt

The dmesg output can be pretty large, for your case at least thousands of lines, thus please catch all the dmesg when possible (dmesg | tee saved_dmesg would be sufficient), until you hit the beyond 100% output. (Please start a new scrub instead of resuming the old one just in case).

The output dmesg would look like this:

[  490.843198] scrub_stripe: finished ret=0 bg=10460594176 len=12582912 flags=data index=0 scrubbed data=12558336 meta=0 device used=10736369664
[  490.848670] scrub_stripe: finished ret=0 bg=13631488 len=8388608 flags=data index=0 scrubbed data=20901888 meta=0 device used=10736369664
[  490.850136] scrub_stripe: finished ret=0 bg=22020096 len=8388608 flags=system|dup index=0 scrubbed data=20901888 meta=16384 device used=10736369664
[  490.851829] scrub_stripe: finished ret=0 bg=22020096 len=8388608 flags=system|dup index=1 scrubbed data=20901888 meta=32768 device used=10736369664
[  490.857770] scrub_stripe: finished ret=0 bg=30408704 len=268435456 flags=metadata|dup index=0 scrubbed data=20901888 meta=10551296 device used=10736369664
[  490.863482] scrub_stripe: finished ret=0 bg=30408704 len=268435456 flags=metadata|dup index=1 scrubbed data=20901888 meta=21069824 device used=10736369664
[  491.523792] scrub_stripe: finished ret=0 bg=298844160 len=1073741824 flags=data index=0 scrubbed data=1094643712 meta=21069824 device used=10736369664

Showing the block group being scrubbed, and the accounting used for showing the btrfs scrub status output.

My current guess is, there is some accounting error doing double accounting, but not sure which part. Thus the debug code would definitely help to pin down the cause.

espux commented 8 months ago

I compiled a 6.5.6-1.1 kernel with the debug diff and ran another scrub. This time it completed as expected. The differences between previous runs are:

6.5 kernel as opposed to 6.4 for previous runs
Smaller device, less than half the size
Scrub ran without pause/resume

I will test further to see if pausing and resuming the scrub can be the source of the problem.

espux commented 8 months ago

A new test might have revealed the problem. When a scrub is canceled and later resumed it appears to start from the beginning of the device .

[159912.943518] BTRFS info (device dm-26): scrub: started on devid 3
[159918.711355] scrub_stripe: finished ret=0 bg=679490361098240 len=1073741824 flags=data|raid1 index=1 scrubbed data=1073676288 meta=0 device used=5993626861568
[159923.668076] scrub_stripe: finished ret=0 bg=679492508581888 len=1073741824 flags=data|raid1 index=1 scrubbed data=2147418112 meta=0 device used=5993626861568
[159928.199280] scrub_stripe: finished ret=0 bg=679493582323712 len=1073741824 flags=data|raid1 index=1 scrubbed data=3221159936 meta=0 device used=5993626861568
[159932.753294] scrub_stripe: finished ret=0 bg=679494656065536 len=1073741824 flags=data|raid1 index=1 scrubbed data=4294901760 meta=0 device used=5993626861568
..
..
..
[161320.348004] scrub_stripe: finished ret=0 bg=679885498089472 len=1073741824 flags=data|raid1 index=1 scrubbed data=331784798208 meta=0 device used=5993626861568
[161324.726283] scrub_stripe: finished ret=0 bg=679886571831296 len=1073741824 flags=data|raid1 index=1 scrubbed data=332858540032 meta=0 device used=5993626861568
[161329.128224] scrub_stripe: finished ret=0 bg=679887645573120 len=1073741824 flags=data|raid1 index=1 scrubbed data=333932281856 meta=0 device used=5993626861568
[161333.583808] scrub_stripe: finished ret=0 bg=679888719314944 len=1073741824 flags=data|raid1 index=1 scrubbed data=335006023680 meta=0 device used=5993626861568
[161334.623526] scrub_stripe: finished ret=-125 bg=679889793056768 len=1073741824 flags=data|raid1 index=1 scrubbed data=335266070528 meta=0 device used=5993626861568
[161334.623552] BTRFS info (device dm-26): scrub: not finished on devid 3 with status: -125
[161364.677526] BTRFS info (device dm-26): scrub: started on devid 3
[161369.141541] scrub_stripe: finished ret=0 bg=679490361098240 len=1073741824 flags=data|raid1 index=1 scrubbed data=1073676288 meta=0 device used=5993626861568
[161373.564861] scrub_stripe: finished ret=0 bg=679492508581888 len=1073741824 flags=data|raid1 index=1 scrubbed data=2147418112 meta=0 device used=5993626861568
[161377.996226] scrub_stripe: finished ret=0 bg=679493582323712 len=1073741824 flags=data|raid1 index=1 scrubbed data=3221159936 meta=0 device used=5993626861568
[161382.433760] scrub_stripe: finished ret=0 bg=679494656065536 len=1073741824 flags=data|raid1 index=1 scrubbed data=4294901760 meta=0 device used=5993626861568
[161386.857965] scrub_stripe: finished ret=0 bg=679496803549184 len=1073741824 flags=data|raid1 index=1 scrubbed data=5368643584 meta=0 device used=5993626861568
[161391.286557] scrub_stripe: finished ret=0 bg=679497877291008 len=1073741824 flags=data|raid1 index=1 scrubbed data=6442385408 meta=0 device used=5993626861568

kdave / btrfs-progs

scrub does not end #692