livius-ungureanu commented 3 years ago

Environment

Platform ServicePack Version VersionString

Win32NT 10.0.19041.0 Microsoft Windows NT 10.0.19041.0

lsb_release -r Release: 20.04

cat /proc/version Linux version 4.19.104-microsoft-standard (oe-user@oe-host) (gcc version 8.2.0 (GCC)) #1 SMP Wed Feb 19 06:37:35 UTC 2020

Windows build number: [run `[Environment]::OSVersion` for powershell, or `ver` for cmd]
Your Distribution version: [On Debian or Ubuntu run `lsb_release -r` in WSL]
Whether the issue is on WSL 2 and/or WSL 1: [run `cat /proc/version` in WSL]

Steps to reproduce

I am using Intellij linux version running in WSL2 and connected to a X410 server for GUI . While intellij is running some apps WSL2 suddenly stops. After a start again wsl2 I see that

the root fs is mounted readonly: /dev/sdb on / type ext4 (ro,relatime,discard,errors=remount-ro,data=ordered)

also dmesg shows:


lun@LUN:~$ dmesg
[    0.000000] Linux version 4.19.104-microsoft-standard (oe-user@oe-host) (gcc version 8.2.0 (GCC)) #1 SMP Wed Feb 19 06:37:35 UTC 2020
[    0.000000] Command line: initrd=\initrd.img panic=-1 pty.legacy_count=0 nr_cpus=4
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format.
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000e0fff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000001fffff] ACPI data
[    0.000000] BIOS-e820: [mem 0x0000000000200000-0x00000000f7ffffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x00000003feffffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI not present or invalid.
[    0.000000] Hypervisor detected: Microsoft Hyper-V
[    0.000000] Hyper-V: features 0x2e7f, hints 0x22c2c
[    0.000000] Hyper-V Host Build:19041-10.0-0-0.329
[    0.000000] Hyper-V: LAPIC Timer Frequency: 0x1e8480
[    0.000000] tsc: Marking TSC unstable due to running on Hyper-V
[    0.000000] Hyper-V: Using hypercall for remote TLB flush
[    0.000000] tsc: Detected 2808.000 MHz processor
[    0.000012] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[    0.000014] e820: remove [mem 0x000a0000-0x000fffff] usable
[    0.000018] last_pfn = 0x3ff000 max_arch_pfn = 0x400000000
[    0.000041] MTRR default type: uncachable
[    0.000041] MTRR fixed ranges disabled:
[    0.000043]   00000-FFFFF uncachable
[    0.000043] MTRR variable ranges disabled:
[    0.000044]   0 disabled
[    0.000045]   1 disabled
[    0.000045]   2 disabled
[    0.000046]   3 disabled
[    0.000046]   4 disabled
[    0.000047]   5 disabled
[    0.000047]   6 disabled
[    0.000048]   7 disabled
[    0.000048] Disabled
[    0.000049] x86/PAT: MTRRs disabled, skipping PAT initialization too.
[    0.000058] CPU MTRRs all blank - virtualized system.
[    0.000062] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WB  WT  UC- UC
[    0.000064] last_pfn = 0xf8000 max_arch_pfn = 0x400000000
[    0.000105] Using GB pages for direct mapping
[    0.000108] BRK [0x02e00000, 0x02e00fff] PGTABLE
[    0.000109] BRK [0x02e01000, 0x02e01fff] PGTABLE
[    0.000109] BRK [0x02e02000, 0x02e02fff] PGTABLE
[    0.000173] BRK [0x02e03000, 0x02e03fff] PGTABLE
[    0.000709] RAMDISK: [mem 0x02e35000-0x02e44fff]
[    0.000713] ACPI: Early table checksum verification disabled
[    0.000742] ACPI: RSDP 0x00000000000E0000 000024 (v02 VRTUAL)
[    0.000746] ACPI: XSDT 0x0000000000100000 000044 (v01 VRTUAL MICROSFT 00000001 MSFT 00000001)
[    0.000752] ACPI: FACP 0x0000000000101000 000114 (v06 VRTUAL MICROSFT 00000001 MSFT 00000001)
[    0.000758] ACPI: DSDT 0x00000000001011B8 01E184 (v02 MSFTVM DSDT01   00000001 MSFT 05000000)
[    0.000760] ACPI: FACS 0x0000000000101114 000040
[    0.000761] ACPI: OEM0 0x0000000000101154 000064 (v01 VRTUAL MICROSFT 00000001 MSFT 00000001)
[    0.000763] ACPI: SRAT 0x000000000011F33C 000250 (v02 VRTUAL MICROSFT 00000001 MSFT 00000001)
[    0.000765] ACPI: APIC 0x000000000011F58C 000068 (v04 VRTUAL MICROSFT 00000001 MSFT 00000001)
[    0.000769] ACPI: Local APIC address 0xfee00000
[    0.001027] Zone ranges:
[    0.001028]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.001029]   DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
[    0.001030]   Normal   [mem 0x0000000100000000-0x00000003feffffff]
[    0.001031] Movable zone start for each node
[    0.001031] Early memory node ranges
[    0.001032]   node   0: [mem 0x0000000000001000-0x000000000009ffff]
[    0.001033]   node   0: [mem 0x0000000000200000-0x00000000f7ffffff]
[    0.001034]   node   0: [mem 0x0000000100000000-0x00000003feffffff]
[    0.001647] Zeroed struct page in unavailable ranges: 4449 pages
[    0.001649] Initmem setup node 0 [mem 0x0000000000001000-0x00000003feffffff]
[    0.001651] On node 0 totalpages: 4157087
[    0.001652]   DMA zone: 59 pages used for memmap
[    0.001653]   DMA zone: 22 pages reserved
[    0.001654]   DMA zone: 3743 pages, LIFO batch:0
[    0.001778]   DMA32 zone: 16320 pages used for memmap
[    0.001779]   DMA32 zone: 1011712 pages, LIFO batch:63
[    0.029917]   Normal zone: 49088 pages used for memmap
[    0.029919]   Normal zone: 3141632 pages, LIFO batch:63
[    0.030660] ACPI: Local APIC address 0xfee00000
[    0.030666] ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
[    0.030993] IOAPIC[0]: apic_id 4, version 17, address 0xfec00000, GSI 0-23
[    0.030996] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.030998] ACPI: IRQ9 used by override.
[    0.030999] Using ACPI (MADT) for SMP configuration information
[    0.031004] smpboot: Allowing 4 CPUs, 0 hotplug CPUs
[    0.031011] [mem 0xf8000000-0xffffffff] available for PCI devices
[    0.031012] Booting paravirtualized kernel on bare hardware
[    0.031014] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.152805] random: get_random_bytes called from start_kernel+0x93/0x4bb with crng_init=0
[    0.152810] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:4 nr_node_ids:1
[    0.153228] percpu: Embedded 42 pages/cpu s133400 r8192 d30440 u524288
[    0.153231] pcpu-alloc: s133400 r8192 d30440 u524288 alloc=1*2097152
[    0.153232] pcpu-alloc: [0] 0 1 2 3
[    0.153242] Built 1 zonelists, mobility grouping on.  Total pages: 4091598
[    0.153243] Kernel command line: initrd=\initrd.img panic=-1 pty.legacy_count=0 nr_cpus=4
[    0.156368] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)
[    0.157861] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[    0.186163] Memory: 4096404K/16628348K available (14360K kernel code, 1575K rwdata, 2836K rodata, 1504K init, 2792K bss, 382300K reserved, 0K cma-reserved)
[    0.186450] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.186455] Kernel/User page tables isolation: enabled
[    0.186493] ftrace: allocating 41537 entries in 163 pages
[    0.200005] rcu: Hierarchical RCU implementation.
[    0.200007] rcu:     RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
[    0.200008]  All grace periods are expedited (rcu_expedited).
[    0.200008] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[    0.201980] Using NULL legacy PIC
[    0.201981] NR_IRQS: 16640, nr_irqs: 456, preallocated irqs: 0
[    0.202392] Console: colour dummy device 80x25
[    0.202395] console [tty0] enabled
[    0.202399] ACPI: Core revision 20180810
[    0.202484] Failed to register legacy timer interrupt
[    0.202489] APIC: Switch to symmetric I/O mode setup
[    0.202501] Hyper-V: Using IPI hypercalls
[    0.202501] Hyper-V: Using MSR based APIC access
[    0.202505] clocksource: hyperv_clocksource_tsc_page: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
[    0.202593] Calibrating delay loop (skipped), value calculated using timer frequency.. 5616.00 BogoMIPS (lpj=28080000)
[    0.202595] pid_max: default: 32768 minimum: 301
[    0.202620] Security Framework initialized
[    0.202653] Mount-cache hash table entries: 32768 (order: 6, 262144 bytes)
[    0.202679] Mountpoint-cache hash table entries: 32768 (order: 6, 262144 bytes)
[    0.202877] Last level iTLB entries: 4KB 64, 2MB 8, 4MB 8
[    0.202878] Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4
[    0.202879] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[    0.202880] Spectre V2 : Mitigation: Full generic retpoline
[    0.202881] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[    0.202881] Spectre V2 : Enabling Restricted Speculation for firmware calls
[    0.202885] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[    0.202886] Spectre V2 : User space: Mitigation: STIBP via seccomp and prctl
[    0.202886] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp
[    0.202907] TAA: Vulnerable: Clear CPU buffers attempted, no microcode
[    0.202907] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
[    0.203114] Freeing SMP alternatives memory: 44K
[    0.204093] smpboot: CPU0: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz (family: 0x6, model: 0x4e, stepping: 0x3)
[    0.204158] Performance Events: unsupported p6 CPU model 78 no PMU driver, software events only.
[    0.204193] rcu: Hierarchical SRCU implementation.
[    0.204271] random: crng done (trusting CPU's manufacturer)
[    0.204307] smp: Bringing up secondary CPUs ...
[    0.204345] x86: Booting SMP configuration:
[    0.204345] .... node  #0, CPUs:      #1
[    0.204782] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[    0.204782] TAA CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html for more details.
[    0.204782]  #2 #3
[    0.204782] smp: Brought up 1 node, 4 CPUs
[    0.204782] smpboot: Max logical packages: 1
[    0.204782] smpboot: Total of 4 processors activated (22464.00 BogoMIPS)
[    0.302587] node 0 initialised, 3037411 pages in 100ms
[    0.303739] devtmpfs: initialized
[    0.303739] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.303739] futex hash table entries: 1024 (order: 4, 65536 bytes)
[    0.303739] xor: automatically using best checksumming function   avx
[    0.303739] NET: Registered protocol family 16
[    0.303739] ACPI: bus type PCI registered
[    0.303739] PCI: Fatal: No config space access function found
[    0.303739] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[    0.303739] raid6: Forced to use recovery algorithm avx2x2
[    0.303739] raid6: Forced gen() algo avx2x4
[    0.303739] ACPI: Added _OSI(Module Device)
[    0.303739] ACPI: Added _OSI(Processor Device)
[    0.303739] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.303739] ACPI: Added _OSI(Processor Aggregator Device)
[    0.303739] ACPI: Added _OSI(Linux-Dell-Video)
[    0.303739] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[    0.314487] ACPI: 1 ACPI AML tables successfully acquired and loaded
[    0.315300] ACPI: Interpreter enabled
[    0.315300] ACPI: (supports S0 S5)
[    0.315300] ACPI: Using IOAPIC for interrupt routing
[    0.315300] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.315300] ACPI: Enabled 2 GPEs in block 00 to 0F
[    0.315319] SCSI subsystem initialized
[    0.315319] hv_vmbus: Vmbus version:5.0
[    0.315319] PCI: Using ACPI for IRQ routing
[    0.315319] PCI: System does not support PCI
[    0.315319] hv_vmbus: Unknown GUID: c376c1c3-d276-48d2-90a9-c04748072c60
[    0.315319] clocksource: Switched to clocksource hyperv_clocksource_tsc_page
[    0.327849] VFS: Disk quotas dquot_6.6.0
[    0.327858] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.327907] FS-Cache: Loaded
[    0.327931] pnp: PnP ACPI init
[    0.328131] pnp 00:00: Plug and Play ACPI device, IDs PNP0b00 (active)
[    0.328144] pnp: PnP ACPI: found 1 devices
[    0.337686] NET: Registered protocol family 2
[    0.337915] tcp_listen_portaddr_hash hash table entries: 8192 (order: 5, 131072 bytes)
[    0.337943] TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.338274] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[    0.338395] TCP: Hash tables configured (established 131072 bind 65536)
[    0.338414] UDP hash table entries: 8192 (order: 6, 262144 bytes)
[    0.338640] UDP-Lite hash table entries: 8192 (order: 6, 262144 bytes)
[    0.338690] NET: Registered protocol family 1
[    0.339421] RPC: Registered named UNIX socket transport module.
[    0.339422] RPC: Registered udp transport module.
[    0.339422] RPC: Registered tcp transport module.
[    0.339423] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    0.339424] PCI: CLS 0 bytes, default 64
[    0.339461] Trying to unpack rootfs image as initramfs...
[    0.339593] Freeing initrd memory: 64K
[    0.339595] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    0.339596] software IO TLB: mapped [mem 0xf4000000-0xf8000000] (64MB)
[    0.339678] kvm: no hardware support
[    0.339679] has_svm: not amd
[    0.339679] kvm: no hardware support
[    0.340106] Initialise system trusted keyrings
[    0.340346] workingset: timestamp_bits=46 max_order=22 bucket_order=0
[    0.341303] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    0.341679] NFS: Registering the id_resolver key type
[    0.341684] Key type id_resolver registered
[    0.341685] Key type id_legacy registered
[    0.341687] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
[    0.342554] Key type cifs.idmap registered
[    0.342650] fuse init (API version 7.27)
[    0.342951] SGI XFS with ACLs, security attributes, realtime, scrub, no debug enabled
[    0.343913] 9p: Installing v9fs 9p2000 file system support
[    0.343923] FS-Cache: Netfs '9p' registered for caching
[    0.343958] FS-Cache: Netfs 'ceph' registered for caching
[    0.343960] ceph: loaded (mds proto 32)
[    0.350598] NET: Registered protocol family 38
[    0.350600] Key type asymmetric registered
[    0.350616] Asymmetric key parser 'x509' registered
[    0.350625] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
[    0.350626] io scheduler noop registered (default)
[    0.350910] hv_vmbus: registering driver hv_pci
[    0.351092] ACPI: AC Adapter [AC1] (on-line)
[    0.351396] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    0.351869] Non-volatile memory driver v1.3
[    0.352175] battery: ACPI: Battery Slot [BAT1] (battery present)
[    0.354310] brd: module loaded
[    0.355523] loop: module loaded
[    0.355653] hv_vmbus: registering driver hv_storvsc
[    0.355698] Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
[    0.356314] tun: Universal TUN/TAP device driver, 1.6
[    0.357284] PPP generic driver version 2.4.2
[    0.357374] PPP BSD Compression module registered
[    0.357375] PPP Deflate Compression module registered
[    0.357378] PPP MPPE Compression module registered
[    0.357379] NET: Registered protocol family 24
[    0.357382] hv_vmbus: registering driver hv_netvsc
[    0.358665] scsi host0: storvsc_host_t
[    0.397937] VFIO - User Level meta-driver version: 0.3
[    0.398090] hv_vmbus: registering driver hyperv_keyboard
[    0.398416] rtc_cmos 00:00: RTC can wake from S4
[    0.400412] rtc_cmos 00:00: registered as rtc0
[    0.400431] rtc_cmos 00:00: alarms up to one month, 114 bytes nvram
[    0.400681] device-mapper: ioctl: 4.39.0-ioctl (2018-04-03) initialised: dm-devel@redhat.com
[    0.401020] hv_utils: Registering HyperV Utility Driver
[    0.401021] hv_vmbus: registering driver hv_util
[    0.401068] hv_vmbus: registering driver hv_balloon
[    0.401146] drop_monitor: Initializing network drop monitor service
[    0.401167] hv_utils: cannot register PTP clock: 0
[    0.401546] hv_utils: TimeSync IC version 4.0
[    0.401687] Mirror/redirect action on
[    0.401809] hv_balloon: Using Dynamic Memory protocol version 2.0
[    0.402276] IPVS: Registered protocols (TCP, UDP)
[    0.402303] IPVS: Connection hash table configured (size=4096, memory=64Kbytes)
[    0.402374] hv_balloon: cold memory discard enabled
[    0.403645] IPVS: ipvs loaded.
[    0.403646] IPVS: [rr] scheduler registered.
[    0.403646] IPVS: [wrr] scheduler registered.
[    0.403647] IPVS: [sh] scheduler registered.
[    0.405563] ipt_CLUSTERIP: ClusterIP Version 0.8 loaded successfully
[    0.405803] Initializing XFRM netlink socket
[    0.405860] NET: Registered protocol family 10
[    0.406299] Segment Routing with IPv6
[    0.408182] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[    0.408298] NET: Registered protocol family 17
[    0.408319] Bridge firewalling registered
[    0.408327] 8021q: 802.1Q VLAN Support v1.8
[    0.408350] sctp: Hash tables configured (bind 256/256)
[    0.408405] 9pnet: Installing 9P2000 support
[    0.408414] Key type dns_resolver registered
[    0.408420] Key type ceph registered
[    0.408614] libceph: loaded (mon/osd proto 15/24)
[    0.408616] hv_vmbus: registering driver hv_sock
[    0.408780] NET: Registered protocol family 40
[    0.409290] registered taskstats version 1
[    0.409295] Loading compiled-in X.509 certificates
[    0.409583] Btrfs loaded, crc32c=crc32c-generic
[    0.410450] rtc_cmos 00:00: setting system clock to 2020-09-10 09:01:37 UTC (1599728497)
[    0.410470] Unstable clock detected, switching default tracing clock to "global"
           If you want to keep using the local clock, then add:
             "trace_clock=local"
           on the kernel command line
[    0.412001] Freeing unused kernel image memory: 1504K
[    0.482854] Write protecting the kernel read-only data: 20480k
[    0.483464] Freeing unused kernel image memory: 1988K
[    0.483789] Freeing unused kernel image memory: 1260K
[    0.483968] Run /init as init process
[    0.762971] scsi 0:0:0:0: Direct-Access     Msft     Virtual Disk     1.0  PQ: 0 ANSI: 5
[    0.763397] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    0.766362] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[    0.771106] sd 0:0:0:0: [sda] 536870912 512-byte logical blocks: (275 GB/256 GiB)
[    0.771108] sd 0:0:0:0: [sda] 4096-byte physical blocks
[    0.771354] sd 0:0:0:0: [sda] Write Protect is off
[    0.771356] sd 0:0:0:0: [sda] Mode Sense: 0f 00 00 00
[    0.771675] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    0.776927] sd 0:0:0:0: [sda] Attached SCSI disk
[    0.797063] EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: discard,errors=remount-ro,data=ordered
[    0.885766] Adding 4194304k swap on /swap/file.  Priority:-2 extents:2 across:4202496k
[    0.993732] scsi 0:0:0:1: Direct-Access     Msft     Virtual Disk     1.0  PQ: 0 ANSI: 5
[    0.994667] sd 0:0:0:1: Attached scsi generic sg1 type 0
[    0.996547] sd 0:0:0:1: [sdb] 536870912 512-byte logical blocks: (275 GB/256 GiB)
[    0.996550] sd 0:0:0:1: [sdb] 4096-byte physical blocks
[    0.996689] sd 0:0:0:1: [sdb] Write Protect is off
[    0.996691] sd 0:0:0:1: [sdb] Mode Sense: 0f 00 00 00
[    0.997240] sd 0:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    1.005273] sd 0:0:0:1: [sdb] Attached SCSI disk
[    1.362677] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[    1.956835] JBD2: Invalid checksum recovering block 97441 in log
[    1.956925] JBD2: recovery failed
[    1.956927] EXT4-fs (sdb): error loading journal
[    3.484294] JBD2: Invalid checksum recovering data block 12382 in log
[    3.512820] JBD2: Invalid checksum recovering data block 12382 in log
[    3.512824] JBD2: Invalid checksum recovering data block 54017 in log
[    3.780618] JBD2: Invalid checksum recovering data block 45544 in log
[    4.154220] JBD2: Invalid checksum recovering data block 50449 in log
[    4.210671] JBD2: Invalid checksum recovering data block 1131 in log
[    4.439616] JBD2: Invalid checksum recovering data block 0 in log
[    4.536846] JBD2: Invalid checksum recovering data block 45539 in log
[    5.358244] JBD2: recovery failed
[    5.358250] EXT4-fs (sdb): error loading journal
[    7.443027] JBD2: Invalid checksum recovering data block 50447 in log
[    7.445502] JBD2: Invalid checksum recovering data block 2121390 in log
[    7.491514] JBD2: Invalid checksum recovering data block 50449 in log
[    8.092075] JBD2: Invalid checksum recovering data block 51343 in log
[    8.417900] JBD2: recovery failed
[    8.417907] EXT4-fs (sdb): error loading journal
[    8.471695] ERROR: MountExt4:1659: mount(/dev/sdb) failed 5
[   17.738797] JBD2: journal transaction 570368 on sdb-8 is corrupt.
[   17.738803] EXT4-fs (sdb): error loading journal
[   18.711310] JBD2: Invalid checksum recovering block 88396 in log
[   18.713604] JBD2: recovery failed
[   18.713606] EXT4-fs (sdb): error loading journal
[   19.194568] JBD2: Invalid checksum recovering data block 54017 in log
[   19.499629] JBD2: recovery failed
[   19.499636] EXT4-fs (sdb): error loading journal
[   19.672459] JBD2: Invalid checksum recovering block 68714 in log
[   19.672564] JBD2: recovery failed
[   19.672611] EXT4-fs (sdb): error loading journal
[   20.472126] JBD2: Invalid checksum recovering data block 9776 in log
[   20.507902] JBD2: Invalid checksum recovering data block 46131 in log
[   20.783018] JBD2: Invalid checksum recovering data block 0 in log
[   20.918815] JBD2: Invalid checksum recovering data block 1790955 in log
[   21.129783] JBD2: recovery failed
[   21.129791] EXT4-fs (sdb): error loading journal
[   21.151232] ERROR: MountExt4:1659: mount(/dev/sdb) failed 5
[   28.658479] EXT4-fs (sdb): 3 orphan inodes deleted
[   28.658480] EXT4-fs (sdb): recovery complete
[   28.684942] EXT4-fs (sdb): mounted filesystem with ordered data mode. Opts: discard,errors=remount-ro,data=ordered
[   28.797403] init: (1) WARNING: /etc/resolv.conf updating disabled in /etc/wsl.conf
[   28.809432] init: (1) WARNING: /etc/resolv.conf updating disabled in /etc/wsl.conf

- filesystem also corrupted some of my files:

cat someProjectFile 2f��)��l�l�.;�K{�7��/��~Gh{�7��/��~Gh{�7��/��~Gh{�7��/��~Gh{�7��/��~Gh{�7��/��~Gh{�7��/��~Gh{�7��/��~Gh{�7��/��~Gh{�7��/��~Gh{�7��/��~Gh{�7��/��~Gh{�7��/��~Gh{�7��/��~Gh{�7��/��~Gh{�7��/....



<!-- 
If you'd like to provide logs you can provide an `strace(1)`  log of the failing command (if `some_command` is failing, then run `strace -o some_command.strace -f some_command some_args`, and link the contents of `some_command.strace` in a gist. 
More info on `strace` can be found here: https://www.man7.org/linux/man-pages/man1/strace.1.html
You can use Github gists to share the output: https://gist.github.com/
-->

<!--
Collect WSL logs by following these instructions: https://github.com/Microsoft/WSL/blob/master/CONTRIBUTING.md#8-detailed-logs  
-->
**WSL logs**: 

#  Expected behavior

Do not corrupt the ext4 file system.This makes WS2 quite unreliable it would be fine to be fixed as soon as posible.

<!-- A description of what you're expecting, possibly containing screenshots or reference material. -->

# Actual behavior
Every two 2-3 days the ext4 file system gets currupted.

<!-- What's actually happening? -->

Jessidhia commented 3 years ago

I have also encountered this when yarn installing large dependency trees; especially when large native library builds are involved (e.g. when puppeteer is a dependency). After the build finishes, / is usually already remounted ro, e2fsck finds errors to correct, and even after restarting wsl (wsl --shutdown cycle) several files, usually my shell history file and some files inside node_modules, are corrupted; the latter forcing the install to be re-done, which leads to non-deterministic loops of manually installing, rechecking, restarting, until everything works.

onomatopellan commented 3 years ago

The only corruption problem I had so far with wsl2 was when once my vhdx expanded with little disk space available in the system drive. WSL2 uses by default a dynamic vhdx that can grow until 256Gb but it seems the linux distro never knows what's really happening in the host disk. If that was the real problem at least in last insider build you can mount an external disk to avoid this happening again when compiling a big project for example.

Having little disk space available in C: could be a reason, another reason could be the vhdx was fully expanded and trying to write more in it did bring the corruption problem. For that try to expand the vhdx disk size and see if it lasts more days until corruption. https://docs.microsoft.com/en-us/windows/wsl/compare-versions#expanding-the-size-of-your-wsl-2-virtual-hardware-disk

livius-ungureanu commented 3 years ago

Some relevant facts before running into this:

wsl's ext4 partition was on C: where it was running out of space(e.g. 1GB left)
I have merged C and D with diskpart.

I cannot see other relevant facts.

Today I've run again into this:

  0.874250] Adding 4194304k swap on /swap/file.  Priority:-2 extents:2 across:4202496k
[    1.157347] JBD2: Invalid checksum recovering block 71532 in log
[    1.159743] JBD2: recovery failed
[    1.159745] EXT4-fs (sdb): error loading journal
[    1.300933] JBD2: Invalid checksum recovering block 67930 in log
[    1.301045] JBD2: recovery failed
[    1.301047] EXT4-fs (sdb): error loading journal
[    1.407173] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[    2.843982] JBD2: Invalid checksum recovering data block 51785 in log
[    3.058614] JBD2: recovery failed
[    3.058620] EXT4-fs (sdb): error loading journal
[    4.419174] JBD2: Invalid checksum recovering data block 9776 in log
[    4.613561] JBD2: Invalid checksum recovering data block 1879416 in log
[    4.709278] JBD2: recovery failed
[    4.709284] EXT4-fs (sdb): error loading journal
[    4.884286] JBD2: journal transaction 586032 on sdb-8 is corrupt.
[    4.884288] EXT4-fs (sdb): error loading journal
[    5.444378] JBD2: journal transaction 588851 on sdb-8 is corrupt.
[    5.444380] EXT4-fs (sdb): error loading journal
[    6.768213] JBD2: Invalid checksum recovering data block 51785 in log
[    7.118482] JBD2: recovery failed
[    7.118488] EXT4-fs (sdb): error loading journal
[    7.141381] ERROR: MountExt4:1659: mount(/dev/sdb) failed 5
[   10.670841] JBD2: Invalid checksum recovering block 74456 in log
[   10.671840] JBD2: recovery failed
[   10.671842] EXT4-fs (sdb): error loading journal
[   11.460877] JBD2: Invalid checksum recovering block 79887 in log
[   11.462274] JBD2: recovery failed
[   11.462276] EXT4-fs (sdb): error loading journal
[   12.885092] JBD2: Invalid checksum recovering data block 51785 in log
[   12.887995] JBD2: Invalid checksum recovering data block 51785 in log
[   12.895065] JBD2: Invalid checksum recovering data block 51785 in log
[   12.913122] JBD2: Invalid checksum recovering data block 51785 in log
[   13.198654] JBD2: recovery failed
[   13.198657] EXT4-fs (sdb): error loading journal
[   14.461139] EXT4-fs (sdb): 1 orphan inode deleted
[   14.461141] EXT4-fs (sdb): recovery complete
[   14.472229] EXT4-fs (sdb): mounted filesystem with ordered data mode. Opts: discard,errors=remount-ro,data=ordered
[   14.574547] EXT4-fs error (device sdb): ext4_validate_block_bitmap:376: comm init: bg 237: bad block bitmap checksum
[   14.576137] Aborting journal on device sdb-8.
[   14.577767] EXT4-fs (sdb): Remounting filesystem read-only
[   14.580144] EXT4-fs error (device sdb): ext4_validate_block_bitmap:376: comm init: bg 243: bad block bitmap checksum
[   14.584150] EXT4-fs error (device sdb): ext4_validate_block_bitmap:376: comm init: bg 244: bad block bitmap checksum
[   14.586089] EXT4-fs error (device sdb): ext4_validate_block_bitmap:376: comm init: bg 246: bad block bitmap checksum
[   14.589313] EXT4-fs error (device sdb): ext4_validate_block_bitmap:376: comm init: bg 248: bad block bitmap checksum
[   14.591079] EXT4-fs error (device sdb): ext4_validate_block_bitmap:376: comm init: bg 249: bad block bitmap checksum
[   14.593645] EXT4-fs error (device sdb): ext4_validate_block_bitmap:376: comm init: bg 250: bad block bitmap checksum
[   14.595744] EXT4-fs error (device sdb): ext4_validate_block_bitmap:376: comm init: bg 252: bad block bitmap checksum
[   14.597753] EXT4-fs error (device sdb): ext4_validate_block_bitmap:376: comm init: bg 254: bad block bitmap checksum

I will have to re-install wsl2 as it has become painful.

onomatopellan commented 3 years ago

WSL2 vhdx disk sure needed more space to breathe.

Were C and D contiguous partitions? What do you see now in Disk Manager? Dynamic (olive) or basic disk (blue)?

livius-ungureanu commented 3 years ago

Bad news is that I have:

uninstalled ubuntu
deleted the old vhdx disk to make sure is not ever used again
I have installed again ubuntu 20.04
I have installed again intellij (ideaIU-2020.2.1.tar.gz) in wsl2 (i.e just untar the archive)
two projects were building in parallel in two intellij instances.

I've Run again into the same file-system corruption:

[   50.597844] EXT4-fs (sdb): error loading journal
[   50.651008] ERROR: MountExt4:1659: mount(/dev/sdb) failed 5
[   53.884701] JBD2: Invalid checksum recovering data block 7804 in log
[   53.884725] JBD2: Invalid checksum recovering data block 1197491 in log
[   53.887520] JBD2: Invalid checksum recovering data block 7819 in log
[   54.480457] JBD2: Invalid checksum recovering data block 1714631 in log
[   54.609942] JBD2: Invalid checksum recovering data block 9768 in log
[   54.609958] JBD2: Invalid checksum recovering data block 1970778 in log
[   54.673843] JBD2: Invalid checksum recovering data block 9532 in log
[   54.924479] JBD2: Invalid checksum recovering data block 1069865 in log
[   54.926929] JBD2: Invalid checksum recovering data block 10919 in log
[   55.021543] JBD2: Invalid checksum recovering data block 1 in log
[   55.171721] JBD2: Invalid checksum recovering data block 11024 in log
[   55.594706] JBD2: Invalid checksum recovering data block 1 in log
[   56.095269] JBD2: recovery failed
[   56.095275] EXT4-fs (sdb): error loading journal
[   60.111181] EXT4-fs (sdb): 1 orphan inode deleted

Looks like a wsl2 bug.. some app should not corrupt a file system.

onomatopellan commented 3 years ago

When that happens, is the ext4.vhdx disk completely enlarged? (filesize +256Gb)

livius-ungureanu commented 3 years ago

No, current size is on disk is 7.90 GB

livius-ungureanu commented 3 years ago

Weird. It does not allow me to resize.

DISKPART> Select vdisk file="C:\Users\liviu.ungureanu\AppData\Local\Packages\CanonicalGroupLimited.Ubuntu20.04onWindows_79rhkp1fndgsc\LocalState\ext4.vhdx"

DiskPart successfully selected the virtual disk file.

DISKPART> expand vdisk maximum=100000

DiskPart has encountered an error: The parameter is incorrect. See the System Event Log for more information.

Note: Ubuntu-20.04 Stopped 2

onomatopellan commented 3 years ago

Too weird. Remember you can always export a distro to another partition/disk with the wsl.exe --export option. I suspect that C and D weren't merged correctly so I'd move the distro to another disk, if possible.

livius-ungureanu commented 3 years ago

onomatopellan commented 3 years ago

Is the disk HDD, SDD, NVME?

livius-ungureanu commented 3 years ago

SDD

onomatopellan commented 3 years ago

Ok, thanks. If you have another disk with enough space try to export the distro there.

About the Diskpart error, did you try with something like expand vdisk maximum=300000? (300Gb)

livius-ungureanu commented 3 years ago

I do not have another disk :-) 300000 worked as it is a valid value indeed.

I guess I've got a more narrow isolation of the problem.

install a fresh ubuntu 20.04
expand the vdisk to 300000 (though it could be skipped)
copy a large file (e.g 5GB) on the ext4 fs
copy this large file in paralel within the ext4 fs

wsl2 should suddenly stop:

[process exited with code 1]

PS C:\Users\liviu.ungureanu> wsl -l -v NAME STATE VERSION Ubuntu-20.04 Stopped 2

Though in this case it looks like file system is not corrupted even wsl2 dies suddenly.

Question: my sdd is encrypted. Could this contribute in any way to the problem?

onomatopellan commented 3 years ago

A large file? Sounds similar to #5410 It's still open because it's hard to reproduce. Like in that thread, try using the .wslconfig file and reducing processors and memory used for WSL2.

livius-ungureanu commented 3 years ago

ok, I'll give it a try with .wslconfig

craigloewen-msft commented 3 years ago

We took a look at this problem but weren't able to diagnose any obvious WSL related problems. It's definitely hard for us to repro this problem. As @onomatopellan mentioned, we're also wondering if this is disk related. If you are able to, could you please try running the same repro steps on another disk?

anaisbetts commented 3 years ago

@craigloewen-msft I hit this too, #5026 seems to be related. It seems to be more reliably triggered if you run shutdown /r /t 0 to reboot the machine. I don't believe this is related to the host disk being corrupted, I think that under certain Windows shutdown commands, the WSL2 VM is getting its power yanked instead of getting properly shut down

luigimannoni commented 3 years ago

To add on the above, long machine suspend state or hibernation also trigger the corruption.

livius-ungureanu commented 3 years ago

I decrypted my disk just to eliminate this track. I still get corruptions when I/O is suddenly intensive(i.e Intellij is re-indexing a large project due to some dependency update)

The corruption can be more or less virlulent. In this case dmesg reported a small one. [ 2.691613] EXT4-fs (sdb): 1 orphan inode deleted

@craigloewen-msft I haven't time to try with some other disk as I am working on a office laptop. But as the picture a bit above shows it looks healthy.

craigloewen-msft commented 3 years ago

Thanks for the additional info! We'll keep trying on the dev team to see if we can get a repro for this (I've tried the shutdown trick @anaisbetts mentioned a few times but haven't had any 'luck' yet). If anyone else can find a way to repro this consistently please comment it and tag me in it.

anaisbetts commented 3 years ago

@craigloewen-msft That's surprising, on my computer it hits nearly 100% of the time. Use the VM for something real, wait a bit, shutdown /r /t 0, your .zhistory file's last line is garbage

craigloewen-msft commented 3 years ago

We've identified the issue that has been causing this, and have put in a fix! I'll leave this issue open as our landing zone for the WSL repo, and will be posting updates here and on the popular issue on the WSL 2 kernel repo: https://github.com/microsoft/WSL2-Linux-Kernel/issues/168

craigloewen-msft commented 3 years ago

This is fixed in Insiders preview build 21292, can any folks here who see this issue install this build and let us know if that resolves you? Thank you!

livius-ungureanu commented 3 years ago

@craigloewen-msft Good news!

Unfortunately I will have to wait for the official roll-out since on my laptop I am not allowed to run windows insider builds. Is there any other way to install/upgrade WSL2 to try out the new fix?

craigloewen-msft commented 3 years ago

Unfortunately not yet. I will update these threads with details if this fix becomes available on more versions.

benhillis commented 3 years ago

Reopening while fix is being confirmed.

gmargari commented 3 years ago

Not sure if this is related or I should open another issue, but just got my git repo corrupted. My disk is an SSD.

$ lsb_release -r
Release:        20.04
$ cat /proc/version
Linux version 4.4.0-18362-Microsoft (Microsoft@Microsoft.com) (gcc version 5.4.0 (GCC) ) #1049-Microsoft Thu Aug 14 12:01:00 PST 2020
$ dmesg
[    0.011281]  Microsoft 4.4.0-18362.1049-Microsoft 4.4.35
[    0.218674] <3>init: (1) ERROR: UtilCreateProcessAndWait:489: /bin/mount failed with status
[    0.218678] 2000

PovarovDenis commented 3 years ago

I'm still having this problem - my git is being corrupted from time to time in WSL2.

denis@DESKTOP-ANM2KR6:~/Projects/ui-admin$ git status
error: object file .git/objects/dd/45712db7f3718ac2c1eb512898eba180c241a9 is empty
error: object file .git/objects/dd/45712db7f3718ac2c1eb512898eba180c241a9 is empty
error: object file .git/objects/dd/45712db7f3718ac2c1eb512898eba180c241a9 is empty
fatal: loose object dd45712db7f3718ac2c1eb512898eba180c241a9 (stored in .git/objects/dd/45712db7f3718ac2c1eb512898eba180c241a9) is corrupt

craigloewen-msft commented 3 years ago

@PovarovDenis , what version of Windows are you using? Are you on Windows Insiders using the latest version? (If you're not sure please just paste the output of running ver in CMD).

kimkwanka commented 3 years ago

Is this fix already included in the non-Insiders version? I ask because my .zhistory file got corrupted twice now in the last 2 days. I'm Running Windows 10 Pro [Version 10.0.19042.928].

craigloewen-msft commented 3 years ago

No this isn't fixed in non-insiders yet. We would need to verify that this is working as expected on Insider builds.

We also haven't had much luck getting a repro for this, if you are able to get a consistent repro at all then that would be very helpful to us! For what it's worth I run the latest Insider builds on my main productivity machine, use WSL daily, and run ZSH and haven't seen this corruption issue which to me indicates that when the next major version of Windows is available your issue should be fixed! :)

livius-ungureanu commented 3 years ago

@craigloewen-msft

This has become a way of life :-) as it happens at least once every 2 days. Imagine your daily setup is somewhat ~ 6 docker containers(~ 800MB each ) and 3-4 intellij instances eagerly to update their indexes... and suddenly [process exited with code 1] . And then the setup needs to be brought up back.

Literally everything runs within WSL2 like in a linux box:

linux docker and not Windows Docker Desktop
intellij linux version that just connects to the X server running in Win and edits files only on the wsl special partition.

If you can provide me somehow with a fixed wsl2, I am keen to install it. Unfortunately only on my office laptop I have this environment and windows insider is not possible.

craigloewen-msft commented 3 years ago

@livius-ungureanu as of right now the only way to get the latest changes is to be on Windows Insiders.

Do you have a separate machine that you can put on Insiders and run the same workflow on it?

I tried to repro this on the latest builds by upgrading intellij and building some containers simultaneously but wasn't able to hit this issue on my machine.

mbwhite commented 3 years ago

Not sure if this helps or not.. but I've encountered this issue on my home machine; but not on my work laptop. Both are windows 10 pro, and I've setup up WSL2 the same way on both.. Work laptop gets significantly more usage, and hasn't hit this issue.

Home machine has hit this - the only difference I can see is Windows Docker Desktop in installed on the home machine (it's not now)... I've seen other reports of this error that also mention docker desktop.

Might spark an idea?

mthorning commented 3 years ago

Yes! This is exactly my situation, I hadn't considered Docker Desktop could be the problem, thanks.

luigimannoni commented 3 years ago

I am somewhat inclined to point at docker as well, in fact the times I've had data loss was on different docker projects with containers running, made the habit stop docker containers gracefully, exit docker desktop and switch off wsl before rebooting/shutting down.

However there are people mentioning bash history becoming corrupt too, which does not sound correlated to docker and never happened in my case.

Anuiran commented 3 years ago

I don’t run docker, just Ubuntu 20.04 in WSL2 and PhpStorm and had git corrupted today after rebooting my pc.

craigloewen-msft commented 3 years ago

As of right now it seems fixed on the latest Windows Insider builds. If you're seeing this issue, please comment with your Windows build number,, ensure that you're on the latest Windows Insider build, and include as many repro steps as you can! Thanks!

ksze commented 2 years ago

Any news on whether the fix is in release/stable build yet?

sarim commented 2 years ago

@jjaaccoobb you are running wsl version 1. Which doesnt use ext4. This thread is about ext4 filesystem in wsl2.

sozercan commented 2 years ago

As of right now it seems fixed on the latest Windows Insider builds.

@craigloewen-msft do we need to update to Windows 11 to get this fix? Will this fix be available in future Windows 10 builds?

mhsdesign commented 2 years ago

i started getting these problem with corrupted git and zsh corrupt history two days ago (https://github.com/microsoft/WSL/issues/5026) ... weird because i cant recall changing anything meaningfull on my setup. Before that everything worked fine - i read something about shutting wls down carefully with wsl --shutdown but while it was working i wasnt following any rules.

frequently using wls2 with: mariadb 10.3, php 7.4, git, and vscode in wls mode.

(oh and yes i should update my windows - maybe this fixes something - using: Microsoft Windows [Version 10.0.19042.1110])

f-liva commented 2 years ago

Same here

f-liva commented 2 years ago

As of right now it seems fixed on the latest Windows Insider builds.

@craigloewen-msft do we need to update to Windows 11 to get this fix? Will this fix be available in future Windows 10 builds?

Windows 11 suffers the same issue

craigloewen-msft commented 2 years ago

@f-liva when are you seeing this on Windows 11? Do you have any repro steps that we could use to help diagnose this problem??

f-liva commented 2 years ago

I'm using WSLg to run GitKraken from Debian Linux subsystem.

I have some Git repositories weighing about 400MB and with many files. Working with GitKraken on this repository, then performing normal stash, commit or push and pull operations, the subsystem often crashes due to an EXT4-related error. GitKrakens suddenly closes, PhpStorm tells me that it can no longer read files from WSL and Docker crashes. When I restart Docker I'm often notified of an EXT4-related problem.

There is no fixed procedure to follow to replicate this problem. It occurs very often and in different operations. For example if a first crash is caused by a file stash on the repository, the next time it works, and maybe it crashes on commit or push. In short, everything happens relatively to Git repositories with thousands of files in versioning.

As I understand it, the more operations that are active on the WSL filesystem, the more likely it is that it will crash.

The PC is new and so is the ssd, so I exclude any kind of hardware problem.

Is there any way to record these crashes in any WSL logs? If yes I could collect some and send them to you.

craigloewen-msft commented 2 years ago

Do you have an HDD or SSD?

And could you please enable logging using the instructions found here and try to reproduce a crash and then send it to me? That might give us more clues on what's going on.

I'll also try grabbing a large git repository and doing the operations you just listed.

EDIT: I just tried reproing this for a while. I installed and ran Git Kraken, had Docker Desktop installed and on, git cloned the VS Code repo, made 100s of thousands of small files, some larger files (2GB sized), committed them, stashed them, made new branches, switched branches, and then set that all on auto repeat for 30 minutes. I was on my HDD and wasn't able to detect any crashes or corruption. We aren't able to repro this issue, so if you are seeing this please could you list out your exact steps on when this happen so we can try and replicate it and get a repro of the problem? Thank you!

f-liva commented 2 years ago

I have a brand new SSD

Check out this feedback https://aka.ms/AAdncc7

microsoft / WSL

WSl2 corrupts ext4 filesystem #5895

Environment

Steps to reproduce