microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.46k stars 822 forks source link

gzip from Ubuntu Jammy doesn't execute #8219

Open iBug opened 2 years ago

iBug commented 2 years ago

Version

Microsoft Windows [Version 10.0.19044.1586]

WSL Version

Kernel Version

4.4.0-19041-Microsoft

Distro Version

Ubuntu 22.04 "Jammy Jellyfish"

Other Software

GZip version 1.10-4ubuntu3 and 1.10-4ubuntu4 amd64.

Repro Steps

Expected Behavior

No error shows up.

Actual Behavior

sh: 1: gzip: Exec format error

The binary doesn't execute, so no strace.

Diagnostic Logs

Similar to that one, the same binary runs perfectly OK on a native Ubuntu Jammy machine. However, this time the binary is 97520 bytes and no section points outside this range.

Dissected binary using Wireshark: https://paste.ubuntu.com/p/nc2v6ZSRHW/

The previous version gzip 1.10-4ubuntu1 is fine, so I've installed that one instead and setting apt-mark hold for now.

It's very interesting that only gzip is found problematic. And it's the same program as 3 years ago. Time to wonder if gzip has any magic to break on WSL1.

hzqmwne commented 2 years ago

The old version gzip 1.10-4ubuntu1 works well, and the new version gzip 1.10-4ubuntu3 triggers this issue.

I compare their program headers by readelf -l gzip command, and here are the LOAD segments: For 1.10-4ubuntu1 (which has no bug):

  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  ...
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000001fc0 0x0000000000001fc0  R      0x1000
  LOAD           0x0000000000002000 0x0000000000002000 0x0000000000002000
                 0x000000000000e405 0x000000000000e405  R E    0x1000
  LOAD           0x0000000000011000 0x0000000000011000 0x0000000000011000
                 0x00000000000035d0 0x00000000000035d0  R      0x1000
  LOAD           0x0000000000014690 0x0000000000015690 0x0000000000015690
                 0x0000000000000d88 0x0000000000000d88  RW     0x1000
  LOAD           0x0000000000000000 0x0000000000018000 0x0000000000018000
                 0x0000000000000000 0x00000000000ca810  RW     0x1000
  ...

and for 1.10-4ubuntu3 (which has the bug):

  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  ...
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000001fa8 0x0000000000001fa8  R      0x1000
  LOAD           0x0000000000002000 0x0000000000002000 0x0000000000002000
                 0x000000000000e319 0x000000000000e319  R E    0x1000
  LOAD           0x0000000000011000 0x0000000000011000 0x0000000000011000
                 0x00000000000036a8 0x00000000000036a8  R      0x1000
  LOAD           0x0000000000016690 0x0000000000016690 0x0000000000016690
                 0x0000000000000d88 0x00000000000cc180  RW     0x2000
  ...

Notice that the buggy version has a strange LOAD segment with Align 0x2000, and after patching the 0x2000 to 0x1000 (by modifying only one byte of gzip binary at offset 0x189 from 0x20 to 0x10), the bug disappears and the patched binary works well!

So, maybe WSL1 makes a wrong assumption that the p_align value is 0x1000. It is just a bug in WSL1 rather than Ubuntu, and seems quite easy to fix.

iBug commented 2 years ago

Gzip 1.10-4ubuntu4 is out and is also affected. The binary is identical to 1.10-4ubuntu3 except for some "ID" part, namely:

File length remains 97520 (0x17CF0)

The new binary, as you'd imagine, works well after manually patching byte 0x189 from 0x20 to 0x10.

hzqmwne commented 2 years ago

By reversing lxcore.sys, the real reason of this bug is that WSL1 assumes all p_align member in PT_LOAD program headers must be the same value, which is not correct. (See elf(5) — Linux manual page, there is not such assumption for p_align)

Part of decompilation pesudo C code of lxcore.sys (version 10.0.22000.1 from Windows 11 21H2 22000.556): lxcore.sys.zip

__int64 __fastcall LxpElfInfoParse(__int128 *a1, unsigned __int64 a2, _OWORD *a3)    // RVA 0x1C004DC60
{
...
    if ( (_DWORD)v61 == 0x464C457F )    // "\x7fELF"
    {
...
      v46 = 0i64;
...
        v19 = *(__m128i *)v18;    // typeof(v18) is "Elf64_PHdr *"
        v58 = v19;
        v51 = v19;
        v20 = *((_OWORD *)v18 + 1);    // Elf64_PHdr.p_vaddr
        v60 = v20;
        v52 = v20;
        v21 = *((__m128i *)v18 + 2);
        v59 = v21;
        v53 = v21;
        v22 = *((_QWORD *)v18 + 6);    // Elf64_PHdr.p_align
        v54 = v22;
        v23 = _mm_cvtsi128_si32(v19);    // Elf64_PHdr.p_type
        if ( v23 == 3 )    // PT_INTERP
        {
...
        if ( v23 == 1 )    // PT_LOAD
        {
...
            if ( !v54 || (v54 & 0xFFF) != 0 )    // check p_align is multiple of page size
            {
              v5 = "LxpElfInfoParse: LocalProgramHeader.Align\n";
              v6 = 541;
              goto LABEL_5;
            }
            if ( (unsigned __int64)v52 % v54 != v24 % v54 )    // check "p_vaddr % p_align == p_offset % p_align"
            {
              v5 = "LxpElfInfoParse: LocalProgramHeader.VirtualAddress\n";
              v6 = 554;
              goto LABEL_5;
            }
            if ( v46 )
            {
              if ( v46 != v54 )    // Bug here! WSL1 assumes all `p_align` member in `PT_LOAD` program headers must be the same value, which is not correct. 
              {
                v5 = "LxpElfInfoParse: LoadHeaderAlignment\n";
                v6 = 567;
                goto LABEL_5;
              }
            }
            else
            {
              v46 = v54;
            }
...
}

LxpElfInfoParse function (RVA 0x1C004DC60) at lxcore.sys parses ELF file. See pesudo code above, 0x464C457F is the ELF magic number ("\x7fELF"), and v23 is p_type member of Elf64_Phdr. v23 == 1 means PT_LOAD (see here), and v54 is p_align member of Elf64_Phdr. v46 is initialized to 0, and when it firstly meets v54, it will be set to its value. When v46 secondly meets v54, it checks v54 should equals to the old v46 value, which causes this issue.

So, for example, after patching all the p_align value from 0x1000 to 0x2000 (offset 0xE1, 0x119 and 0x151 of gzip 1.10-4ubuntu4), the new binary can also works well.

Zenderable commented 2 years ago

Same issue here (Ubuntu 22.04 & WSL1). I can't start VS Code Server due this issue:

Installing VS Code Server for x64 (dfd34e8260c270da74b5c2d86d61aee4b6d56977)
Downloading: 100%
/usr/bin/gzip: 1: ELF: Permission denied
/usr/bin/gzip: 3: : Permission denied
/usr/bin/gzip: 4: Syntax error: "(" unexpected
tar: Child returned status 2
tar: Error is not recoverable: exiting now

tar is unable to read /home/stumski/.vscode-server/bin/dfd34e8260c270da74b5c2d86d61aee4b6d56977-1650739878.tar.gz. Either the file is corrupt or tar has an issue.
There's a known WSL issue with tar on Ubuntu 19.10.
See workaround in https://github.com/microsoft/vscode-remote-release/issues/1856.
Reload the window to initiate a new server download.

stumski@C-H50K6G3:~$ gzip
-bash: /usr/bin/gzip: cannot execute binary file: Exec format error

Older version works well:

sudo dpkg -i ./gzip_1.10-4ubuntu1_amd64.deb
stumski@C-H50K6G3:~$ sudo dpkg -i ./gzip_1.10-4ubuntu1_amd64.deb
dpkg: warning: downgrading gzip from 1.10-4ubuntu4 to 1.10-4ubuntu1
(Reading database ... 33879 files and directories currently installed.)
Preparing to unpack ./gzip_1.10-4ubuntu1_amd64.deb ...
Unpacking gzip (1.10-4ubuntu1) over (1.10-4ubuntu4) ...
Setting up gzip (1.10-4ubuntu1) ...
Processing triggers for install-info (6.8-4build1) ...
Processing triggers for man-db (2.10.2-1) ...

stumski@C-H50K6G3:~$ sudo apt-mark hold gzip
gzip set on hold.

stumski@C-H50K6G3:~$ gzip
gzip: compressed data not written to a terminal. Use -f to force compression.
For help, type: gzip -h
stumski@C-H50K6G3:~$ code
Updating VS Code Server to version dfd34e8260c270da74b5c2d86d61aee4b6d56977
Removing previous installation...
Installing VS Code Server for x64 (dfd34e8260c270da74b5c2d86d61aee4b6d56977)
Downloading: 100%
Unpacking: 100%
Unpacked 2341 files and folders to /home/stumski/.vscode-server/bin/dfd34e8260c270da74b5c2d86d61aee4b6d56977.
stumski@C-H50K6G3:~$
dreamlayers commented 2 years ago

So, for example, after patching all the p_align value from 0x1000 to 0x2000 (offset 0xE1, 0x119 and 0x151 of gzip 1.10-4ubuntu4), the new binary can also works well.

Actually those were all 0x1000 already. In gzip 1.10-4ubuntu4 I only had to change the value at offset 0x189 using echo -en '\x10' | sudo dd of=/usr/bin/gzip count=1 bs=1 conv=notrunc seek=$((0x189))

LostInBrittany commented 2 years ago

It also happens in WSL2, I have just found it (unable to start VScode because of it)

Patch by @dreamlayers worked beautifully, thanks!

Zenderable commented 2 years ago

@LostInBrittany Are you absolutely certain that your distribution is running under WSL2? You can list all distros with wsl -l -v. I ask, because when switching to WSL2 it didn't happen to me. I tried patch by @dreamlayers on newest gzip and it's working as it should!

LostInBrittany commented 2 years ago

@Zenderable OMG, you're right, my bad... I have two Windows 11 computers, both with Windows Insiders, both with WSL, and I was sure both of them used the same WSL version... And not, this one has WSL 1, and the other WSL 2 (and it hasn't the problem, you're right). Sorry about that!

landall commented 2 years ago

I trigger the same issue. WSL 2 is ok, WSL 1 is wrong.

stong commented 2 years ago

By reversing lxcore.sys, the real reason of this bug is that WSL1 assumes all p_align member in PT_LOAD program headers must be the same value, which is not correct. (See elf(5) — Linux manual page, there is not such assumption for p_align) ...

IDA Pro and Hexrays to the rescue again!

Uzume commented 2 years ago

@benhillis It seems like your ELF loader still has some serious issues.

For anyone else trying to run Ubuntu 22.04 LTS in WSL (especially WSL1), such will also likely want the workaround for #7054 too.

wangqr commented 2 years ago

I came here from #8151, nodejs from Arch could not run on WSL1.

In my case, node also has different p_align values in its program headers, but this only happens if nodejs is built after 2022/02/14, when Archlinux upgraded glibc from 2.33 to 2.35. After some git bisect, I find out that GNU ld changed p_align in binutils 2.38, or more specifically binutils-gdb@74e315dbfe5. Previously, if a section requires alignment higher than max-page-size, it won't affect p_align in corresponding segment. After this commit, if a section requires alignment higher than max-page-size, the required alignment will be set as p_align of this and later segments. In nodejs's case, it has a lpstub section that is aligned to 2MiB, which caused several LOAD segments having p_align=0x200000.

Here's the python script I used to patch p_align values:

from elftools.elf.elffile import ELFFile  # pip install pyelftools

target_p_align = 0x1000
in_file = '/usr/bin/node'
out_file = '/usr/local/bin/node'

with open(in_file, 'rb') as fp:
    bdata = bytearray(fp.read())
    elf = ELFFile(fp)
    header_size = elf.structs.Elf_Phdr.sizeof()
    for i in range(elf.num_segments()):
        header = elf.get_segment(i).header
        if header.p_type == 'PT_LOAD' and header.p_align != target_p_align:
            print(f'changing alignment of program header {i} from {header.p_align} to {target_p_align}')
            header.p_align = target_p_align
            header_offset = elf._segment_offset(i)
            bdata[header_offset:header_offset+header_size] = elf.structs.Elf_Phdr.build(header)
with open(out_file, 'wb') as fp:
    fp.write(bdata)
sherlockchou86 commented 2 years ago

So, for example, after patching all the p_align value from 0x1000 to 0x2000 (offset 0xE1, 0x119 and 0x151 of gzip 1.10-4ubuntu4), the new binary can also works well.

Actually those were all 0x1000 already. In gzip 1.10-4ubuntu4 I only had to change the value at offset 0x189 using echo -en '\x10' | sudo dd of=/usr/bin/gzip count=1 bs=1 conv=notrunc seek=$((0x189))

work!

hane-junjun commented 2 years ago

So, for example, after patching all the p_align value from 0x1000 to 0x2000 (offset 0xE1, 0x119 and 0x151 of gzip 1.10-4ubuntu4), the new binary can also works well.

Actually those were all 0x1000 already. In gzip 1.10-4ubuntu4 I only had to change the value at offset 0x189 using echo -en '\x10' | sudo dd of=/usr/bin/gzip count=1 bs=1 conv=notrunc seek=$((0x189))

work!

work!

 hanejun ~  wsl --list --verbose NAME STATE VERSION

ReyasAli commented 2 years ago

So, for example, after patching all the p_align value from 0x1000 to 0x2000 (offset 0xE1, 0x119 and 0x151 of gzip 1.10-4ubuntu4), the new binary can also works well.

Actually those were all 0x1000 already. In gzip 1.10-4ubuntu4 I only had to change the value at offset 0x189 using echo -en '\x10' | sudo dd of=/usr/bin/gzip count=1 bs=1 conv=notrunc seek=$((0x189))

Thank you man It saved my time

the-moog commented 2 years ago

Here is a tool that makes use of the code snippet from @wangqr GIST

hbprotoss commented 2 years ago

So, for example, after patching all the p_align value from 0x1000 to 0x2000 (offset 0xE1, 0x119 and 0x151 of gzip 1.10-4ubuntu4), the new binary can also works well.

Actually those were all 0x1000 already. In gzip 1.10-4ubuntu4 I only had to change the value at offset 0x189 using echo -en '\x10' | sudo dd of=/usr/bin/gzip count=1 bs=1 conv=notrunc seek=$((0x189))

It's magic

rattfieldnz commented 2 years ago

So, for example, after patching all the p_align value from 0x1000 to 0x2000 (offset 0xE1, 0x119 and 0x151 of gzip 1.10-4ubuntu4), the new binary can also works well.

Actually those were all 0x1000 already. In gzip 1.10-4ubuntu4 I only had to change the value at offset 0x189 using echo -en '\x10' | sudo dd of=/usr/bin/gzip count=1 bs=1 conv=notrunc seek=$((0x189))

Can confirm this works for me as well.

My build info:

OS Name: Microsoft Windows 10 Home

Version: 10.0.19044 Build 19044

System Type: x64-based PC

WSL Version: 2

Linux Distro: Ubuntu 22.04 (Jammy)

gavenkoa commented 2 years ago

WSL Version: 2

@rattfieldnz This issue is only for WSL 1!!!

@all

Another workaround is to run via ld.so:

/lib64/ld-linux-x86-64.so.2 /usr/bin/node --version

for example the "fix" for node:

sudo mv /usr/bin/node /usr/bin/node-orig
printf '#!/bin/sh\nexec /lib64/ld-linux-x86-64.so.2 /usr/bin/node-orig "$@"' | sudo tee /usr/bin/node
sudo chmod a+x /usr/bin/node
seanthegeek commented 2 years ago

FWIW, gzip 1.10-4+deb11u1 on Debian Bullseye and 1.12-1 on Debian Bookworm do not have this issue.

NotTheDr01ds commented 2 years ago

Also a workaround without actually patching the binary:

printf '#!/bin/sh\nexec /lib64/ld-linux-x86-64.so.2 /usr/bin/gzip "$@"' | sudo tee /usr/local/bin/gzip
sudo chmod +x /usr/local/bin/gzip

From this comment on the related Node issue.

thisconnected commented 2 years ago

has anyone bumped it to windows division? (I havent used windows in ages so i dont really know how) This really ought to be fixed as their linker is faulty and in future will continue to cause problems with newer builds

gavenkoa commented 2 years ago

Microsoft thinks WSL 1 is dead )) Their evangelists use WSL 2 exclusively.

NotTheDr01ds commented 2 years ago

A new gzip package is available in the jammy-proposed repo that appears to fix the issue (well, at least revert the optimizations that cause WSL1 to choke). If you have the ability, please follow the instructions in the Launchpad report to test and report your findings.

My understanding is that only one person needs to confirm, but I'd love more eyes on it than just mine.

FloxD commented 2 years ago

my workaround was to install a newer gzip version manually

i opened up http://archive.ubuntu.com/ubuntu/pool/main/g/gzip/ and copied the link to gzip_1.12-1ubuntu1_amd64.deb then downloaded the file via curl

curl -fsSL -o gzip_1.12-1ubuntu1_amd64.deb http://archive.ubuntu.com/ubuntu/pool/main/g/gzip/gzip_1.12-1ubuntu1_amd64.deb

and installed it

sudo dpkg -i gzip_1.12-1ubuntu1_amd64.deb
dreamlayers commented 2 years ago

In WSL1 Ubuntu 22.04.1 LTS in Windows 10 22H2 (OS Build 19045.2251), new gzip 1.10-4ubuntu4.1 works fine. I cancelled the apt hold I had and simply allowed it to install automatically.

ZupoLlask commented 1 year ago

In WSL1 Ubuntu 22.04.1 LTS in Windows 10 22H2 (OS Build 19045.2251), new gzip 1.10-4ubuntu4.1 works fine. I cancelled the apt hold I had and simply allowed it to install automatically.

I can also confirm this.

emmanuelattia commented 1 year ago

It still doesn't work when we install directly the minifs https://cdimage.ubuntu.com/ubuntu-base/releases/22.04/release/ubuntu-base-22.04-base-amd64.tar.gz under WSL1

Workaround is to install the gzip package, but it would be great to have the one in the minifs already fixed.

osalbahr commented 1 year ago

Should the fixed-in-wsl2 label be added?

mohag commented 1 year ago

It still doesn't work when we install directly the minifs https://cdimage.ubuntu.com/ubuntu-base/releases/22.04/release/ubuntu-base-22.04-base-amd64.tar.gz under WSL1

Workaround is to install the gzip package, but it would be great to have the one in the minifs already fixed.

WSL should be able to run all valid binaries, changing the binaries to work around the bug in WSL doesn't fix WSL...

kotenok2000 commented 1 year ago

Is it possible to update lxcore.sys to fix the bug?

maurogarioni commented 5 months ago

please any shell works under windows11?