Open jtabox opened 1 year ago
@jtabox - is it possible your distro vhd is full?
/logs
@jtabox - is it possible your distro vhd is full?
No, I don't think so. As I said, this happened with fresh installs of the Ubuntu 22.04 directly from the Store, so I hadn't installed many things. Maybe a miniconda installation at most, the vhdx files were never above 2 Gb.
In regard to logs, I have actually not had a similar issue ever since I deactivated sparseVhd
in the configuration file. I installed a lot of things by now, the image is at 30 Gb at the moment and it seems to be working fine. So, I don't know how much use any logs would have now.
I'm not sure if my corruption issues were directly related to sparseVhd
option, or indirectly, in some obscure way. But the fact is I haven't had any corruption the last day or so, while previously with sparseVhd enabled, the image would become corrupted within an hour or two.
I'll try to make a backup of my working installation and re-enable sparseVhd and see if the issues come back.
Thank you @jtabox. Can you try to reproduce the issue under log collection ? We'd need to to see logs from when the disk becomes normal to corrupt to root cause the issue.
/logs
I've also had this issue when enabling sparseVhd
and --set-sparse
on Ubuntu. It's also affecting files outside the VM for me:
C:/
drive were getting corrupted (random crashes, softlocks). Reinstalling them fixed the issuesD:/
driveThese issues started from the very first 2.0.0
pre-release version.
Edit: My drive got corrupted again, although this time I had disabled sparseVhd
on my distro. I guess my C:/
drive corruption issues are unrelated?
This is a clean Win11 install today, fully updated and running 2.0.4 with sparseVhd in my .wslconfig and --set-sparse against Ubuntu
I run touch test
sucessfully, activate a conda env and try running pycharm from Windows (remote connection to WSL2). Try touch test2
and get a readonly error. All happens within a few minutes.
Here are my logs for the above.
WslLogs-2023-10-11_22-06-43.zip
This is chkdsk straight after:
The type of the file system is NTFS.
WARNING! /F parameter not specified.
Running CHKDSK in read-only mode.
Stage 1: Examining basic file system structure ...
Attribute list for file 6196 is corrupt.
Attribute list for file 6197 is corrupt.
270592 file records processed.
File verification completed.
Phase duration (File record verification): 5.54 seconds.
File record segment 1783E is an orphan.
File record segment 1783F is an orphan.
File record segment 2FBD0 is an orphan.
File record segment 2FBD1 is an orphan.
8188 large file records processed.
Phase duration (Orphan file record recovery): 10.84 milliseconds.
Errors found. CHKDSK cannot continue in read-only mode.```
@NGRhodes So you're getting similar errors I assume? With sparseVhd
active? At least I'm not the only one, I haven't seen any similar feedback so I was worried it's something specific to my PC. Do you use antivirus software? Besides Microsoft's Defender. I've installed Avast Free Antivirus recently, and it's been a bit too eager to block and meddle in stuff in general, so I was wondering if it's related in some way.
@ASleepyCat Luckily I haven't had any issues with anything else outside the vhdx file getting corrupted. Gotta admit it sounds a bit far-fetched, I'd assume ´sparseVhd´ only affects the vhdx files, though I might be totally wrong here.
wsl --manage Ubuntu --set-sparse false
and I can still reproduce the error.
@jtabox - I have tried with Kaspersky and Windows Defender and the drive goes readonly in both cases.
I had a similar issue
Context:
sparseVhd=true
in .wslconfig
--manage --set-sparse true
on existing imageThen I started to got random problems, did a few shutdown/restart of WSL, until I figured out that the file system was locking into read-only after less than minute of use after each "reboot" of the WSL distro.
Attempts to fix:
sparseVhd
from .wslconfig
, - ran --manage --set-sparse false
. No luck, same problems.e2fsck
which found (and reportedly fixed) many errors. Back in first distro I had the same problem (and e2fsck
would not find any new errors.)I ended up copying most of my data/config files from the the first to second distro (Ubuntu22.04 as well), re-installed what I needed, and deleted the first one. It was simply broken.
I've used the second distro every day for over a week and its working totally fine under WSL 2.0.3. I strongly suspect --set-sparse true
to be the cause, and I didn't take any change to enable it on the second distro. I didn't have time to try to reproduce the issue though.
This also causes to make the areas of the C: drive dirty! In my case setting sparse would not only cause read only errors on the distro filesystem everywhere but also causes minor filesystem corruption which chkdsk
(in windows re) reports free space not being able to properly freed up?
I managed however to fix read only file system errors by running e2fsck
on WSL system distro wsl --system
and by mounting the vhdx which I used
wsl --mount --vhd .\ext4.vhdx --bare
and do e2fsck /dev/sdc -f -y
and it works, though it could corrupt some files (which in my case my oh my zsh prints a lot of errors)
I experimented with pre-release versions of WSL2 and sparseVhd option too. And I run fstrim in wsl2 too. And it seems that my host Windows 11 23H2 NTFS system is quite corrupted now (and vhd ext4 too, btw), so sfc and dism cannot repair it. Quite dangerous stuff...
I enabled sparseVhd a few months ago, for the last month my filesystem in ubuntu started getting corrupted every other programming session, especially when running docker desktop. Prior to enabling this option, I had been using this system for over a year and had no problems. Also, the windows file system started to get corrupted when using wsl too, in cases where I don't use wsl, this behaviour is not observed.
I love WSL as a concept and for the amazing utility it offers for free, and I truly appreciate the work being poured into it. But honestly, I'm staying as far away from sparseVhd
as humanly possible, at least for the time being.
Since I opened this issue a few months ago, every one of the 3-4 times I changed my mind and decided to give sparseVhd
one more try, it has always ended with me straight up deleting the test distro's vhdx file within the first hour of use and having to create a new one from scratch (after deactivating sparseVhd
of course) because it's impossible to fix its corruption issues.
There surely must be some kind of interaction between sparseVhd
and something on my part, but I can't figure out what it is, after multiple tries. Luckily the host system doesn't seem to have been corrupted so far, but I don't dare use my main Windows PC for my tries.
Just chiming in here, that I've observed exactly what's being described here as well. Any distro with sparseVhd enabled will eventually suffer fs corruption. And it also resulted in FS corruption of the Host NTFS, which I was almost about to throw out my SSD for.
Same problem, both with WSL2 corruption and the host drive.
I am also experiencing the same WSL and host corruption as everyone else since switching to --set-sparse true
.
unfortunately, my vhdx file gets corrupted after hours when I enable --set-sparse true
.
unfortunately, my vhdx file gets corrupted after hours when I enable
--set-sparse true
.
run a chkdsk on your host fs while you still can, and delete any vhds that were in sparse mode.
So I've managed to somewhat curb the corruption
My VHDXs are showing 0 bytes also, can the data be recovered?
It has been almost a year since this feature has been released and for what it looks like we have no fix to the corruption problem yet?
The recommendation at https://github.com/MicrosoftDocs/WSL/issues/1855 should at least be changed.
It has been almost a year since this feature has been released and for what it looks like we have no fix to the corruption problem yet?
The recommendation at MicrosoftDocs/WSL#1855 should at least be changed.
Ever since I opened this thread, I've been consistently and periodically getting notifications of a new post here, so I'm really curious what the cause might be. Still, we're a small minority that's getting the corruption issue, so I assume there must be something specific in our PCs that interacts with WSL in such a catastrophic way. There would be way more open issues if this was a widespread problem.
At this point I've just given up the sparseVhd
option completely, and If I'm being honest, even if the issue is fixed in a future update, I still won't be activating the option. The consequences are way too annoying and disrupting, and I don't have any spare PCs to test.
So as long as sparseVhd
is not implemented as a default option, I'm fine with it taking time to resolve.
This is still an experimental feature for which you need to go out of your way to set a flag. And then when your disk corrupts, you also need to notice it, and make the connection that this is the cause.
For me, on multiple systems, turning on the sparseVhd feature very reliably corrupts the filesystem of both guest and host, so I highly doubt it's system dependent.
Just FYI.
I was also facing this issue and gave up using sparseVhd. To me, it seemed using docker made corruption frequently. Docker may not be the direct cause, but it can be a key to reproduce this issue.
Yes, I've noticed this issue while playing with docker and fstrim commands.
docker probably just creates and deletes A LOT of files, so it gives the sparse stuff a lot more work to do and a lot more chances to make a mess.
Same issue here.
I set the VHD to sparse, got filesystem read-only errors in WSL, fixed them with e2fsck. Then Win11 reported a hard drive issue, and after the automatic repair, I couldn’t boot into it anymore🥲.
+1
After setting sparseVhd=true
and wsl --manage <distro> --set-sparse true
, my WSL became read-only and eventually corrupted my Windows system. Despite clean installing Windows and rebuilding the WSL environment, it became read-only and corrupted again. I'm also using Docker Desktop.
While WSL has saved me a lot of time, this option has cost me a lot of time again 😇
This issue had been open and almost heated issue almost a year and none of the WSL developers looks after this issue
Reproduced it again on Win 11 WSL2 setup with host corruption. Critically dangerous feature!
Windows Version
Microsoft Windows [Version 10.0.22621.2361]
WSL Version
2.0.4.0
Are you using WSL 1 or WSL 2?
Kernel Version
5.15.123.1-1
Distro Version
Ubuntu 22.04
Other Software
No response
Repro Steps
I can't really reproduce this, I was mostly wondering if anyone else has had their virtual hd's gradually becoming corrupted with the pre-release version of WSL2. I installed v2.0.4.0 two days ago, had an Ubuntu 22.04 (the standard distro from the store) already installed. After some time I suddenly started getting
filesystem read-only
errors. I've had this distro installed (same vhdx file) for almost a year now, never had similar issue (or any other for that matter). Only things that had changed now is the pre-release version and 2 settings activated in.wslconf
,autoMemoryReclaim=gradual
andsparseVhd=true
(I also ran--manage --set-sparse true
for my already existing image). Google said it's probably a corrupted disk image,e2fsck
found errors and supposedly repaired them, but the read-only errors persisted. I loaded a copy of the vhdx file that I had from before the pre-release install, soon enough it also started throwing read-only errors. Debug console showed the root filesystem was being mounted with errors, I would correct them withe2fsck
but they persisted. I ended up nuking both files, and did a fresh distro install. Went fine but at some point it started throwing corruption errors too, this timedpkg
wouldn't run because of corrupted files. I've now spent the last two days uninstalling and reinstalling WSL and testing out distros, but have been having the same issue. I wonder if it could be thesparseVhd
option. Has anyone else had any similar issues? I've deactivated the option for now and watching if I get a corrupted file again, if I do I'll probably revert to the previous release version.Expected Behavior
Mainly I'd expect my vhdx files not becoming corrupted 😅
Actual Behavior
They became corrupted.
Diagnostic Logs
No response