microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.26k stars 812 forks source link

Mounting CIFS share causes kernel panic #8848

Closed IanSudbery closed 7 months ago

IanSudbery commented 1 year ago

Version

Microsoft Windows [Version 10.0.19042.2006]

WSL Version

Kernel Version

5.10.102.1

Distro Version

Ubuntu 20.04

Other Software

cifs-utils 2:6.9-1ubuntu0.2

Repro Steps

Mount a cifs share:

sudo mount -t cifs //fstore.XXXX.XXXX.XXX.uk/shared/ /mnt/test -o username=mb1ims,rw,file_mode=0700,dir_mode=0700,uid=XXXX

list a directory on the share:

ls /mnt/test/sudlab1

Expected Behavior

A list of the files in the directory

Actual Behavior

All terminals connected to wsl instantly die. WSL needs to be restated. Sometimes wsl --shutdown is needed before restart.

Diagnostic Logs

First time I did this (with kernal 5.10.16.3-microsoft-standard-WSL2), there was a error reported in windows event viewer:

'Virtual Machine' has encountered a fatal error. The guest operating system reported that it failed with the following error codes: ErrorCode0: 0x0, ErrorCode1: 0x0, ErrorCode2: 0x0, ErrorCode3: 0x0, ErrorCode4: 0x0. If the problem persists, contact Product Support for the guest operating system. (Virtual machine ID 3AD2A95F-FEAD-4037-90FF-DA741FB9124E)

Guest message:
[ 49.165408] hv_balloon: Max. dynamic memory size: 26134 MB
[ 61.534251] WSL2: Performing memory compaction.
[ 723.926325] CIFS: Attempting to mount //[fstore.XXXX.XXXX.XX.XX/shared](http://fstore.XXXX.XXXX.XX.XX/shared)
[ 723.926346] CIFS: No dialect specified on mount. Default has changed to a more secure dialect, SMB2.1 or later (e.g. SMB3.1.1), from CIFS (SMB1). To use the less secure SMB1 dialect to access old servers which do not support SMB3.1.1 (or even SMB3 or SMB2.1) specify vers=1.0 on mount.
[ 734.767383] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 734.767390] #PF: supervisor instruction fetch in kernel mode
[ 734.767392] #PF: error_code(0x0010) - not-present page
[ 734.767393] PGD 0 P4D 0
[ 734.767396] Oops: 0010 [#1] SMP PTI
[ 734.767399] CPU: 2 PID: 323 Comm: ls Not tainted 5.10.16.3-microsoft-standard-WSL2 #1
[ 734.767402] RIP: 0010:0x0
[ 734.767404] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[ 734.767405] RSP: 0018:ffffc9000311fbd8 EFLAGS: 00010293
[ 734.767407] RAX: 0000000000000000 RBX: ffffc9000311fc48 RCX: 0000000000000001
[ 734.767408] RDX: 0000000000000000 RSI: 0000000000020000 RDI: ffffc9000311fc48
[ 734.767409] RBP: ffffc9000311fd70 R08: 0000000000000002 R09: 0000000000000064
[ 734.767411] R10: ffff888101514f00 R11: 65726168535f424d R12: 0000000000000002
[ 734.767412] R13: 0000000000000000 R14: 00000000002a0044 R15: 0000000000000000
[ 734.767414] FS: 00007f085e772400(0000) GS:ffff88864f880000(0000) knlGS:0000000000000000
[ 734.767417] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 734.767418] CR2: ffffffffffffffd6 CR3: 000000018e1f0001 CR4: 00000000001706a0
[ 734.767419] Call Trace:
[ 734.767425] __traverse_mounts+0x8f/0x220
[ 734.767429] step_into+0x430/0x6c0
[ 734.767433] ? cifs_d_revalidate+0x49/0xd0
[ 734.767435] walk_component+0x72/0x1b0
[ 734.767437] path_lookupat.isra.0+0x6d/0x150
[ 734.767440] filename_lookup+0xae/0x140
[ 734.767443] ? __check_object_size+0x136/0x150
[ 734.767447] ? strncpy_from_user+0x4e/0x140
[ 734.767450] vfs_statx+0x72/0x110
[ 734.767452] __do_sys_newstat+0x39/0x70
[ 734.767455] ? do_user_addr_fault+0x1c5/0x3e0
[ 734.767459] do_syscall_64+0x33/0x80
[ 734.767463] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 734.767465] RIP: 0033:0x7f085e93c62a
[ 734.767467] Code: 00 00 75 05 48 83 c4 18 c3 e8 f2 24 02 00 66 90 f3 0f 1e fa 41 89 f8 48 89 f7 48 89 d6 41 83 f8 01 77 2d b8 04 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 06 c3 0f 1f 44 00 00 48 8b 15 31 a8 0d 00 f7
[ 734.767470] RSP: 002b:00007fff725a6b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000004
[ 734.767472] RAX: ffffffffffffffda RBX: 000056427eaab0b0 RCX: 00007f085e93c62a
[ 734.767473] RDX: 000056427eaab0c8 RSI: 000056427eaab0c8 RDI: 00007fff725a8660
[ 734.767474] RBP: 00007fff725a6ec0 R08: 0000000000000001 R09: 00000000725a8600
[ 734.767476] R10: 0000000000000002 R11: 0000000000000246 R12: 00007fff725a8660
[ 734.767477] R13: 0000000000000000 R14: 00007fff725a8660 R15: 000056427eaab0c8
[ 734.767479] Modules linked in:
[ 734.767482] CR2: 0000000000000000
[ 734.767484] ---[ end trace ccadc0971026d88e ]---
[ 734.767493] RIP: 0010:0x0
[ 734.767495] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[ 734.767497] RSP: 0018:ffffc9000311fbd8 EFLAGS: 00010293
[ 734.767498] RAX: 0000000000000000 RBX: ffffc9000311fc48 RCX: 0000000000000001
[ 734.767499] RDX: 0000000000000000 RSI: 0000000000020000 RDI: ffffc9000311fc48
[ 734.767501] RBP: ffffc9000311fd70 R08: 0000000000000002 R09: 0000000000000064
[ 734.767502] R10: ffff888101514f00 R11: 65726168535f424d R12: 0000000000000002
[ 734.767503] R13: 0000000000000000 R14: 00000000002a0044 R15: 0000000000000000
[ 734.767505] FS: 00007f085e772400(0000) GS:ffff88864f880000(0000) knlGS:0000000000000000
[ 734.767507] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 734.767508] CR2: ffffffffffffffd6 CR3: 000000018e1f0001 CR4: 00000000001706a0
[ 734.767509] Kernel panic - not syncing: Fatal exception
[ 734.776513] Kernel Offset: disabled

However I then updated everything (windows, the WSL kernal, and the cifs-utils version. Now I get nothing in the windows event viewer.

I attach log files. WslLogs-2022-09-22_12-21-36.zip

benhillis commented 1 year ago

You're running a pretty old version of the kernel (5.10.16), could you try upgrading and seeing if the issue goes away?

IanSudbery commented 1 year ago

I was running 5.10.16 when I got the pasted error message above, but I updated to 5.10.102.1. The symptoms are the same (WSL still closes), but now there is no event in the event viewer. The WslLogs file attached above was with 5.10.102.1. Sorry if that wasn't clear.

lm1baker commented 1 year ago

I also encounter this problem for over a year. I can mount a CIFS share, but if I navigate to any DFS directory, WSL2 immediately dies. Are your subdirectories also using DFS?

Approximate fstab (some detail removed) //[$company.domain]/public /mnt/public cifs, nofail,user,vers=3.02,credentials=/home/[$user]/.smbcredentials,iocharset=utf8,uid=[$UID],file_mode=0700,dir_mode=0700, 0 0

I am using the latest kernel uname -r 5.10.102.1-microsoft-standard-WSL2

Log Name:      Microsoft-Windows-Hyper-V-Worker-Admin
Source:        Microsoft-Windows-Hyper-V-Worker
Date:          5/12/2022 3:32:48 PM
Event ID:      18590
Task Category: None
Level:         Critical
Description:
'Virtual Machine' has encountered a fatal error.  The guest operating system reported that it failed with the following error codes: ErrorCode0: 0x0, ErrorCode1: 0x0, ErrorCode2: 0x0, ErrorCode3: 0x0, ErrorCode4: 0x0.  If the problem persists, contact Product Support for the guest operating system.  (Virtual machine ID 6DFA6611-E2C7-444A-9416-83C5BD3E9C3B)

Guest message:
[ 1029.756380] CR2: ffffffffffffffd6 CR3: 0000000108a64003 CR4: 00000000003706a0
[ 1029.756382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1029.756384] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1029.756386] Call Trace:
[ 1029.756393]  __traverse_mounts+0x8f/0x220
[ 1029.756399]  step_into+0x430/0x6c0
[ 1029.756404]  ? cifs_d_revalidate+0x49/0xd0
[ 1029.756407]  walk_component+0x72/0x1b0
[ 1029.756411]  path_lookupat.isra.0+0x6e/0x150
[ 1029.756414]  ? cifs_revalidate_dentry_attr+0x3f/0x230
[ 1029.756416]  filename_lookup+0xae/0x140
[ 1029.756421]  ? __check_object_size+0x136/0x150
[ 1029.756425]  ? strncpy_from_user+0x4e/0x140
[ 1029.756428]  __x64_sys_chdir+0x3e/0xe0
[ 1029.756433]  do_syscall_64+0x33/0x80
[ 1029.756437]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1029.756440] RIP: 0033:0x7fe383889a1b
[ 1029.756443] Code: c3 48 8b 15 77 d4 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb c6 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 50 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 45 d4 0d 00 f7 d8 64 89 01 48
[ 1029.756446] RSP: 002b:00007fff8bf1ca68 EFLAGS: 00000246 ORIG_RAX: 0000000000000050
[ 1029.756449] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe383889a1b
[ 1029.756451] RDX: 000055811fcc5c60 RSI: 000055811fdc1f40 RDI: 000055811fdb7570
[ 1029.756453] RBP: 000055811fdb7570 R08: 0000000000000003 R09: 0000000000000001
[ 1029.756455] R10: 0000000000000000 R11: 0000000000000246 R12: 000055811fd9da90
[ 1029.756457] R13: 0000000000000000 R14: 0000000000000009 R15: 0000000000000000
[ 1029.756460] Modules linked in:
[ 1029.756464] CR2: 0000000000000000
[ 1029.756466] ---[ end trace f30ba024c1d320d0 ]---
[ 1029.756478] RIP: 0010:0x0
[ 1029.756481] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[ 1029.756483] RSP: 0018:ffffc90000313ca0 EFLAGS: 00010293
[ 1029.756485] RAX: 0000000000000000 RBX: ffffc90000313d10 RCX: 0000000000000001
[ 1029.756487] RDX: 0000000000000000 RSI: 0000000000020000 RDI: ffffc90000313d10
[ 1029.756489] RBP: ffffc90000313e40 R08: 0000000000000002 R09: 0000000000000064
[ 1029.756491] R10: ffff8881297dd300 R11: 6572617774666f53 R12: 0000000000000002
[ 1029.756493] R13: 0000000000000000 R14: 00000000002a0044 R15: 0000000000000000
[ 1029.756495] FS:  00007fe383778740(0000) GS:ffff8883f7c80000(0000) knlGS:0000000000000000
[ 1029.756498] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1029.756499] CR2: ffffffffffffffd6 CR3: 0000000108a64003 CR4: 00000000003706a0
[ 1029.756501] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1029.756503] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1029.756505] Kernel panic - not syncing: Fatal exception
[ 1034.820749] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 1039.881679] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 1044.912137] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 1049.988807] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 1055.134689] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 1060.272218] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 1065.460395] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 1070.573162] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 1075.611532] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 1080.653832] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 1085.687072] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 1090.761713] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 1095.891513] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 1101.681516] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 1107.380327] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 1112.891350] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 1118.662310] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 1123.776967] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 1129.492335] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 1134.839836] hv_vmbus: Waiting for VMBus UNLOAD to complete
[ 1134.849843] hv_vmbus: Continuing even though VMBus UNLOAD did not complete
[ 1134.849847] Kernel Offset: disabled
bscan commented 1 year ago

This happened to me where a specific subfolder on my CIFS share was a DFS share. Attempting to access the DFS share caused a kernel panic. Seems like WSL does not currently have DFS support enabled in the kernel.

Workaround: I was able to mount it using mount -t drvfs instead, and then access the DFS share.

WSL version: 1.0.3.0
Kernel version: 5.15.79.1
IanSudbery commented 1 year ago

Could be... I don't know the technical implementation details of the storage server, but it is the sort of thing that might use DFS.

I can mount via drvfs, although only if I already have it mounted in Windows space. At which point access, particularly stat is so slow as to make it virtually unusable - i have software that is executing many stat calls against the share, and it just completely grinds to a halt. Its actaully faster with sshfs.

microsoft-github-policy-service[bot] commented 7 months ago

This issue has been automatically closed since it has not had any activity for the past year. If you're still experiencing this issue please re-file this as a new issue or feature request.

Thank you!