Closed EchterAgo closed 1 year ago
created branch new-kmem -
Crashed a very short time after starting the copy operation:
21: kd> k
# Child-SP RetAddr Call Site
00 fffff686`320ebee8 fffff802`26117f82 nt!DbgBreakPointWithStatus
01 fffff686`320ebef0 fffff802`261179f2 nt!KiBugCheckDebugBreak+0x12
02 fffff686`320ebf50 fffff802`25ffd747 nt!KeBugCheck2+0xdd2
03 fffff686`320ec660 fffff802`2601ad7d nt!KeBugCheckEx+0x107
04 fffff686`320ec6a0 fffff802`25fd01c2 nt!PspSystemThreadStartup$filt$0+0x44
05 fffff686`320ec6e0 fffff802`26007d32 nt!_C_specific_handler+0xa2
06 fffff686`320ec750 fffff802`25eca3c7 nt!RtlpExecuteHandlerForException+0x12
07 fffff686`320ec780 fffff802`25ec94e6 nt!RtlDispatchException+0x297
08 fffff686`320ecea0 fffff802`2601186c nt!KiDispatchException+0x186
09 fffff686`320ed560 fffff802`2600d2bd nt!KiExceptionDispatch+0x12c
0a fffff686`320ed740 fffff802`2b972ada nt!KiPageFault+0x43d
0b fffff686`320ed8d0 fffff802`2b96b8b4 OpenZFS!kmem_findslab+0x5a [H:\dev\openzfs\module\os\windows\spl\spl-kmem.c @ 840]
0c fffff686`320ed920 fffff802`2b96a3f9 OpenZFS!kmem_error+0x84 [H:\dev\openzfs\module\os\windows\spl\spl-kmem.c @ 871]
0d fffff686`320ed9e0 fffff802`2b974be7 OpenZFS!kmem_slab_free+0x1e9 [H:\dev\openzfs\module\os\windows\spl\spl-kmem.c @ 1472]
0e fffff686`320eda60 fffff802`2b96ef76 OpenZFS!kmem_magazine_destroy+0x1c7 [H:\dev\openzfs\module\os\windows\spl\spl-kmem.c @ 1738]
0f fffff686`320edac0 fffff802`2b9782d4 OpenZFS!kmem_cache_magazine_purge+0x1a6 [H:\dev\openzfs\module\os\windows\spl\spl-kmem.c @ 2990]
10 fffff686`320edb20 fffff802`2b965bba OpenZFS!kmem_cache_magazine_resize+0x84 [H:\dev\openzfs\module\os\windows\spl\spl-kmem.c @ 3095]
11 fffff686`320edb60 fffff802`25f0e6f5 OpenZFS!taskq_thread+0x51a [H:\dev\openzfs\module\os\windows\spl\spl-taskq.c @ 2083]
12 fffff686`320edc10 fffff802`26006278 nt!PspSystemThreadStartup+0x55
13 fffff686`320edc60 00000000`00000000 nt!KiStartSystemThread+0x28
sp
seems to be zeroed.
21: kd> dt sp
Local var @ 0xfffff686320ed8f8 Type kmem_slab*
0xffffbe8d`71bbffb8
+0x000 slab_cache : (null)
+0x008 slab_base : (null)
+0x010 slab_link : avl_node
+0x028 slab_head : (null)
+0x030 slab_refcnt : 0n0
+0x034 slab_chunks : 0n0
+0x038 slab_stuck_offset : 0
+0x03c slab_later_count : 0
+0x03e slab_flags : 0
+0x040 slab_create_time : 0n0
sp->slab_cache != cp
in kmem_slab_free
Retrying produced another crash very fast:
11: kd> k
# Child-SP RetAddr Call Site
00 fffff488`330ebb58 fffff800`6db17f82 nt!DbgBreakPointWithStatus
01 fffff488`330ebb60 fffff800`6db179f2 nt!KiBugCheckDebugBreak+0x12
02 fffff488`330ebbc0 fffff800`6d9fd747 nt!KeBugCheck2+0xdd2
03 fffff488`330ec2d0 fffff800`6da1ad7d nt!KeBugCheckEx+0x107
04 fffff488`330ec310 fffff800`6d9d01c2 nt!PspSystemThreadStartup$filt$0+0x44
05 fffff488`330ec350 fffff800`6da07d32 nt!_C_specific_handler+0xa2
06 fffff488`330ec3c0 fffff800`6d8ca3c7 nt!RtlpExecuteHandlerForException+0x12
07 fffff488`330ec3f0 fffff800`6d8c94e6 nt!RtlDispatchException+0x297
08 fffff488`330ecb10 fffff800`6da1186c nt!KiDispatchException+0x186
09 fffff488`330ed1d0 fffff800`6da0ce5a nt!KiExceptionDispatch+0x12c
0a fffff488`330ed3b0 fffff800`75bfb876 nt!KiGeneralProtectionFault+0x31a
0b fffff488`330ed540 fffff800`75bb9f95 OpenZFS!avl_first+0x46 [H:\dev\openzfs\module\avl\avl.c @ 182]
0c fffff488`330ed570 fffff800`75bb95f7 OpenZFS!kmem_slab_alloc+0x45 [H:\dev\openzfs\module\os\windows\spl\spl-kmem.c @ 1398]
0d fffff488`330ed5e0 fffff800`75ef5dbf OpenZFS!kmem_cache_alloc+0x437 [H:\dev\openzfs\module\os\windows\spl\spl-kmem.c @ 2276]
0e fffff488`330ed660 fffff800`75e0ccb7 OpenZFS!abd_alloc_chunks+0x34f [H:\dev\openzfs\module\os\windows\zfs\abd_os.c @ 260]
0f fffff488`330ed730 fffff800`75da1447 OpenZFS!abd_alloc+0xe7 [H:\dev\openzfs\module\zfs\abd.c @ 195]
10 fffff488`330ed7a0 fffff800`75d91463 OpenZFS!arc_get_data_abd+0x87 [H:\dev\openzfs\module\zfs\arc.c @ 4968]
11 fffff488`330ed800 fffff800`75d99b1c OpenZFS!arc_hdr_alloc_abd+0x373 [H:\dev\openzfs\module\zfs\arc.c @ 3224]
12 fffff488`330ed8b0 fffff800`75cf61ea OpenZFS!arc_write_ready+0x16bc [H:\dev\openzfs\module\zfs\arc.c @ 6649]
13 fffff488`330eda10 fffff800`75ce9330 OpenZFS!zio_ready+0x22a [H:\dev\openzfs\module\zfs\zio.c @ 4484]
14 (Inline Function) --------`-------- OpenZFS!__zio_execute+0x2f5 [H:\dev\openzfs\module\zfs\zio.c @ 2298]
15 fffff488`330edae0 fffff800`75bb5bba OpenZFS!zio_execute+0x310 [H:\dev\openzfs\module\zfs\zio.c @ 2209]
16 fffff488`330edb60 fffff800`6d90e6f5 OpenZFS!taskq_thread+0x51a [H:\dev\openzfs\module\os\windows\spl\spl-taskq.c @ 2083]
17 fffff488`330edc10 fffff800`6da06278 nt!PspSystemThreadStartup+0x55
18 fffff488`330edc60 00000000`00000000 nt!KiStartSystemThread+0x28
node
seems to be bad.
Crashes on the second iteration.
When I reduce the machines memory to 4GB it keeps running, but quite slowly.
It seems to run fast for a bit (until it needs to reduce memory usage?) and then it seems to stall while the memory usage slowly decreases, then it starts copying again.
Edit: The slowness seems to be from me having tested with just one CPU core in this run. With 4GB and 12 cores it actually seems to run well.
With 8GB it still runs fine, but increasing the memory to 12GB almost immediately lets the machine crash on starting a copy.
My copy started out fast and with a relatively stable memory usage. Now that it is at the point it would crash before it slowed down to ~25% of the speed and the memory usage has a bit of a sawtooth pattern:
It changed back to stable allocation, but the speed is still quite slow. I think it copied more now than it could before though.
Something just happened, at ~2.2 TB copied, same as before, mem usage jumped to 100%, 4.6MB available. I've started logging kstat
output just before this.
I'll let it run for a bit to see what happens. I've attached cbuf from right before this happens, let's see if it crashes.
Yea, system is not really responsive anymore. I'm dumping stacks now, after that I'll post the kstat
logs.
These are the kstat logs, I started logging them very shortly before this, the memory usage was still doing sawtooth shapes but then abruptly rose: kstat_logs.zip
The logs are less than a minute, the system went unresponsive very shortly after I started logging, but I think it is not because I started logging. I expected it to do this based on previous testing so I started logging.
@lundman if you want me to try anything I can trivially produce the first 2 crashes when I run with 12GB RAM, haven't checked the exact limit. As for the last crash it seems to take a bit of copying (~2.2TB from my dataset in this case) with rclone
to hit it.
Does anyone have an idea why a breakpoint like bp /w "segkmem_total_mem_allocated > 13000000000" "spl-seg_kmem.c:141"
doesn't work for me? I'd expect it to get hit when memory usage goes over 13GB in this example.
Edit: I know symbols have to be loaded before setting the breakpoint.
OK thanks for taking a look. Busy morning here.
The sawtooth pattern is very encouraging, it means we detect the pressure and we pull back from it. A little aggressively mind you, so some tweaking is needed there.
I believe Windows has a tendency to sent more than one pressure event. Currently, we increase the amount to free spl_vm_pages_wanted
each time, and up the level spl_vm_pressure_level
from 0,1,2 to 3. 0 being normal, and 3 critical.
Possibly, spl_vm_pressure_level
should just go to 1. Or, perhaps, just set spl_vm_pages_wanted
once, instead of adding to it.
The 12G crash is most peculiar but there is some "small machine" vs "large machine" checks, I will go over them again.
As for the breakpoint, when I try to set a conditional breakpoint in VS, it just tells me it isn't supported for kernel debugging.
Conditional breakpoints definitely work for kernel debugging, maybe just not every condition? I used them multiple times before but IIRC for checking locals, never tried a global. Maybe it was comparing the address of the symbol? I also haven't tried setting them in VS, only in WinDbg.
You can also execute commands when a breakpoint is hit, for example dumping cbuf
.
Admittedly when I tried it was VS2017 or something, never tried again. I do have issues with global vars, can't see their values by hover - which is a real pain.
So i just assumed it cant do it, due to that message
Yea, seems to just be about the VS extension, it does work when setting it in the console.
I just encountered a deadlock in zfs_AcquireForLazyWrite
it seems: #313
I just noticed that if I resume my copy after running out of memory at ~2.2TB I can produce the issue much faster. Just happened after only 99GB.
I bisected this now and the first bad commit is 1bc536da04c47cce6bd5bed0cbd53366ca1043eb
I'll still do some more thorough testing to ensure this is the right commit, but so far it looks like it.
Edit: My git bisect log
:
git bisect start
# status: waiting for both good and bad commits
# bad: [538d466910f55badbad24c6ce0c7189133eb7695] Update kmem to latest macOS
git bisect bad 538d466910f55badbad24c6ce0c7189133eb7695
# status: waiting for good commit(s), bad commit known
# good: [51ee5642ed49a69a6bbd434a30c85d66391c568b] META: zfswin-2.2.0rc5
git bisect good 51ee5642ed49a69a6bbd434a30c85d66391c568b
# bad: [b508b291a4d18afa4a8bada199e98d561fb8906c] Unmounting snapshots need to open correct zfsvfs
git bisect bad b508b291a4d18afa4a8bada199e98d561fb8906c
# bad: [30f59ca2f5e81a93c09a7f0ecbc9b0b60fab0630] Acquire unmount rwlock in callbacks
git bisect bad 30f59ca2f5e81a93c09a7f0ecbc9b0b60fab0630
# good: [21fff86ab62897995a33c20ba0a3f06726ebe71f] DeleteOnClose should ignore errors
git bisect good 21fff86ab62897995a33c20ba0a3f06726ebe71f
# good: [bf5e2c58d2b1ec3fdf4c6a37fc9dc50f8d2003c9] Detect NULL zp sooner in lazy and fastio
git bisect good bf5e2c58d2b1ec3fdf4c6a37fc9dc50f8d2003c9
# bad: [1bc536da04c47cce6bd5bed0cbd53366ca1043eb] Handle SetFileAllocation more correctly
git bisect bad 1bc536da04c47cce6bd5bed0cbd53366ca1043eb
# good: [fd8bf0d2b92d18b818505413bb7dd8e75fc8decd] Correct CcFileSizes and cache use
git bisect good fd8bf0d2b92d18b818505413bb7dd8e75fc8decd
# first bad commit: [1bc536da04c47cce6bd5bed0cbd53366ca1043eb] Handle SetFileAllocation more correctly
I'm still checking fd8bf0d2b92d18b818505413bb7dd8e75fc8decd more, but it is running fine for quite a while now, while with 1bc536da04c47cce6bd5bed0cbd53366ca1043eb it crashed within minutes.
I'm also checking new-kmem with 1bc536da04c47cce6bd5bed0cbd53366ca1043eb reverted. We might want to do that as a temporary fix until we figure out what is actually happening.
Of course reverting 1bc536da04c47cce6bd5bed0cbd53366ca1043eb makes rclone
complain again. I'll try with --local-no-preallocate --local-no-sparse
, both with and without revert.
Looks like with --local-no-preallocate --local-no-sparse
I can get over the ~2.2TB without reverting 1bc536da04c47cce6bd5bed0cbd53366ca1043eb. I think I also have an idea why this starts happening at ~2.2TB, this is where a clean copy process of this dataset starts copying many small files. I'll see tomorrow if I can reproduce this more easily.
So the issue seems to be due to copying many small files with preallocation. Reverting 1bc536da04c47cce6bd5bed0cbd53366ca1043eb or adding --local-no-preallocate --local-no-sparse
to rclone
is a workaround.
I probably can reproduce this quickly by starting copies from a certain set of files.
Edit: Note that this time my reproducer is in a completely new and separate VM that runs on QEMU with a 144TB zpool consisting of passed through harddrives. It still behaves the exact same as my previous VMWare VM and consistently also crashes on ~2.2TB of data copied.
Both running the new-kmem
branch, one with --local-no-preallocate --local-no-sparse
, the other without:
Most peculiar, it's the least exciting commit ever, doesn't do much but set values for Windows - I suppose Windows caches will pre-inflate to that size, but presumably will be released by us at some point. If we weren't releasing them, the problem should still happen - although I suppose it would take much longer to show, and the memory would sit in Windows side, not kstat ballooning.
I agree, it doesn't really make sense to me yet. I'd like to give it more testing to really be sure it is not fd8bf0d2b92d18b818505413bb7dd8e75fc8decd.
What is curious is that it happens from one moment to another, which is why I wanted that conditional breakpoint to work. Memory usage seems very stable for hours but then over the course of a minute it suddenly rises steeply and crashes.
Also the copy with preallocation disabled is still going, now at 3.5TB. I haven't had a successful copy of my dataset since opening this issue. I have some ideas how to produce this issue and will try to write a test for it soon.
Looks like we call CcUninitializeCacheMap()
every time a file is closed, so that should be OK.
Most peculiar, it's the least exciting commit ever, doesn't do much but set values for Windows - I suppose Windows caches will pre-inflate to that size, but presumably will be released by us at some point. If we weren't releasing them, the problem should still happen - although I suppose it would take much longer to show, and the memory would sit in Windows side, not kstat ballooning.
Also, rclone pre-allocates at most 4095 bytes more than the actual file size, it basically rounds up to the next allocation size. Even if this happened on each file in the dataset the overhead would still only be a couple of megabytes.
It might be that my test did not fail prior to 1bc536da04c47cce6bd5bed0cbd53366ca1043eb because rclone
would delete the copies with incorrect size.
FYI I tried bac2eda43e322e8081770ddb95464c65606c3969 and it still crashes in the same way when I give the VM 12GB of memory.
OK thanks. I will see if 12G will trigger it here and see if I can spot why
OK, 12G does crash - if I replace abd_os.c
with old file, it runs. I am unsure what the actual issue is, so far.
OK moved all work back into release-2.2.0 - so we can focus on this issue :)
What I'm wondering here is how can it be that segkmem_total_mem_allocated
goes up to 7815495680 (7.27 GiB) when failing when total_memory
is 4294414336 (3.99 GiB), so almost double. Is that expected behaviour?
I also figured out why my conditional breakpoint was not triggering in osif_malloc
, I used the wrong kind of quotes 🤦🏻
bp /w "segkmem_total_mem_allocated > total_memory" `spl-seg_kmem.c:141`
bp /w "segkmem_total_mem_allocated > 0x3c700000" `spl-seg_kmem.c:141`
these actually work
I'll restart my tests with the latest changes.
quotes? ugh :)
hmm yeah, maybe something is using real_total_memory
when it should be total_memory
.
Yea, real_total_memory
is also something I considered.
I want to check some values when it goes over the limit, I figured out that I can access kstat
values from the debugger and also conditionally break on them or on access.
Setting the conditional breakpoint in osif_malloc
causes things to slow down too much, but I just added the condition in the code and call DbgBreakPoint
instead now.
Something I noticed is in spl_free_thread
, when segkmem_total_mem_allocated
> total_memory
then new_spl_free
is negative, then the later if (new_spl_free > total_memory) new_spl_free = total_memory;
triggers because the negative signed integer is bigger (because it is implicitly casted to a uint64_t
for the comparison), causing new_spl_free
to be set to total_memory
when actually we are already over the limit. I added a new_spl_free > 0
condition in there and am testing right now if this makes a difference.
new_spl_free
also gets overwritten later by the spl_enforce_memory_caps
/ spl_dynamic_memory_cap
check (spl-kmem.c:4814
), but I'm not sure this is always the case.
No, didn't help :\
When testing #281 I noticed that when copying a 5TB dataset using rclone it always ends in an allocation failure:
so
ExAllocatePoolWithTag failed
memory.txt
This seems like a new issue because I was still able to copy the full dataset not that long ago.
I'll try to get some
kstat
information. Is it possible to getkstat
info from the debugger when the failure has already happened? I could also try logging it periodically to a file.