Open iBug opened 2 years ago
The old version gzip 1.10-4ubuntu1 works well, and the new version gzip 1.10-4ubuntu3 triggers this issue.
I compare their program headers by readelf -l gzip
command, and here are the LOAD
segments:
For 1.10-4ubuntu1 (which has no bug):
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
...
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000001fc0 0x0000000000001fc0 R 0x1000
LOAD 0x0000000000002000 0x0000000000002000 0x0000000000002000
0x000000000000e405 0x000000000000e405 R E 0x1000
LOAD 0x0000000000011000 0x0000000000011000 0x0000000000011000
0x00000000000035d0 0x00000000000035d0 R 0x1000
LOAD 0x0000000000014690 0x0000000000015690 0x0000000000015690
0x0000000000000d88 0x0000000000000d88 RW 0x1000
LOAD 0x0000000000000000 0x0000000000018000 0x0000000000018000
0x0000000000000000 0x00000000000ca810 RW 0x1000
...
and for 1.10-4ubuntu3 (which has the bug):
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
...
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000001fa8 0x0000000000001fa8 R 0x1000
LOAD 0x0000000000002000 0x0000000000002000 0x0000000000002000
0x000000000000e319 0x000000000000e319 R E 0x1000
LOAD 0x0000000000011000 0x0000000000011000 0x0000000000011000
0x00000000000036a8 0x00000000000036a8 R 0x1000
LOAD 0x0000000000016690 0x0000000000016690 0x0000000000016690
0x0000000000000d88 0x00000000000cc180 RW 0x2000
...
Notice that the buggy version has a strange LOAD
segment with Align
0x2000
, and after patching the 0x2000
to 0x1000
(by modifying only one byte of gzip
binary at offset 0x189
from 0x20
to 0x10
), the bug disappears and the patched binary works well!
So, maybe WSL1 makes a wrong assumption that the p_align
value is 0x1000. It is just a bug in WSL1 rather than Ubuntu, and seems quite easy to fix.
Gzip 1.10-4ubuntu4
is out and is also affected. The binary is identical to 1.10-4ubuntu3
except for some "ID" part, namely:
File length remains 97520 (0x17CF0)
The new binary, as you'd imagine, works well after manually patching byte 0x189 from 0x20
to 0x10
.
By reversing lxcore.sys
, the real reason of this bug is that WSL1 assumes all p_align
member in PT_LOAD
program headers must be the same value, which is not correct. (See elf(5) — Linux manual page, there is not such assumption for p_align
)
Part of decompilation pesudo C code of lxcore.sys
(version 10.0.22000.1
from Windows 11 21H2 22000.556):
lxcore.sys.zip
__int64 __fastcall LxpElfInfoParse(__int128 *a1, unsigned __int64 a2, _OWORD *a3) // RVA 0x1C004DC60
{
...
if ( (_DWORD)v61 == 0x464C457F ) // "\x7fELF"
{
...
v46 = 0i64;
...
v19 = *(__m128i *)v18; // typeof(v18) is "Elf64_PHdr *"
v58 = v19;
v51 = v19;
v20 = *((_OWORD *)v18 + 1); // Elf64_PHdr.p_vaddr
v60 = v20;
v52 = v20;
v21 = *((__m128i *)v18 + 2);
v59 = v21;
v53 = v21;
v22 = *((_QWORD *)v18 + 6); // Elf64_PHdr.p_align
v54 = v22;
v23 = _mm_cvtsi128_si32(v19); // Elf64_PHdr.p_type
if ( v23 == 3 ) // PT_INTERP
{
...
if ( v23 == 1 ) // PT_LOAD
{
...
if ( !v54 || (v54 & 0xFFF) != 0 ) // check p_align is multiple of page size
{
v5 = "LxpElfInfoParse: LocalProgramHeader.Align\n";
v6 = 541;
goto LABEL_5;
}
if ( (unsigned __int64)v52 % v54 != v24 % v54 ) // check "p_vaddr % p_align == p_offset % p_align"
{
v5 = "LxpElfInfoParse: LocalProgramHeader.VirtualAddress\n";
v6 = 554;
goto LABEL_5;
}
if ( v46 )
{
if ( v46 != v54 ) // Bug here! WSL1 assumes all `p_align` member in `PT_LOAD` program headers must be the same value, which is not correct.
{
v5 = "LxpElfInfoParse: LoadHeaderAlignment\n";
v6 = 567;
goto LABEL_5;
}
}
else
{
v46 = v54;
}
...
}
LxpElfInfoParse
function (RVA 0x1C004DC60) at lxcore.sys
parses ELF file. See pesudo code above, 0x464C457F
is the ELF magic number ("\x7fELF"), and v23
is p_type
member of Elf64_Phdr
. v23 == 1
means PT_LOAD
(see here), and v54
is p_align
member of Elf64_Phdr
.
v46
is initialized to 0
, and when it firstly meets v54
, it will be set to its value. When v46
secondly meets v54
, it checks v54
should equals to the old v46
value, which causes this issue.
So, for example, after patching all the p_align
value from 0x1000
to 0x2000
(offset 0xE1
, 0x119
and 0x151
of gzip 1.10-4ubuntu4
), the new binary can also works well.
Same issue here (Ubuntu 22.04 & WSL1). I can't start VS Code Server due this issue:
Installing VS Code Server for x64 (dfd34e8260c270da74b5c2d86d61aee4b6d56977)
Downloading: 100%
/usr/bin/gzip: 1: ELF: Permission denied
/usr/bin/gzip: 3: : Permission denied
/usr/bin/gzip: 4: Syntax error: "(" unexpected
tar: Child returned status 2
tar: Error is not recoverable: exiting now
tar is unable to read /home/stumski/.vscode-server/bin/dfd34e8260c270da74b5c2d86d61aee4b6d56977-1650739878.tar.gz. Either the file is corrupt or tar has an issue.
There's a known WSL issue with tar on Ubuntu 19.10.
See workaround in https://github.com/microsoft/vscode-remote-release/issues/1856.
Reload the window to initiate a new server download.
stumski@C-H50K6G3:~$ gzip
-bash: /usr/bin/gzip: cannot execute binary file: Exec format error
Older version works well:
sudo dpkg -i ./gzip_1.10-4ubuntu1_amd64.deb
stumski@C-H50K6G3:~$ sudo dpkg -i ./gzip_1.10-4ubuntu1_amd64.deb
dpkg: warning: downgrading gzip from 1.10-4ubuntu4 to 1.10-4ubuntu1
(Reading database ... 33879 files and directories currently installed.)
Preparing to unpack ./gzip_1.10-4ubuntu1_amd64.deb ...
Unpacking gzip (1.10-4ubuntu1) over (1.10-4ubuntu4) ...
Setting up gzip (1.10-4ubuntu1) ...
Processing triggers for install-info (6.8-4build1) ...
Processing triggers for man-db (2.10.2-1) ...
stumski@C-H50K6G3:~$ sudo apt-mark hold gzip
gzip set on hold.
stumski@C-H50K6G3:~$ gzip
gzip: compressed data not written to a terminal. Use -f to force compression.
For help, type: gzip -h
stumski@C-H50K6G3:~$ code
Updating VS Code Server to version dfd34e8260c270da74b5c2d86d61aee4b6d56977
Removing previous installation...
Installing VS Code Server for x64 (dfd34e8260c270da74b5c2d86d61aee4b6d56977)
Downloading: 100%
Unpacking: 100%
Unpacked 2341 files and folders to /home/stumski/.vscode-server/bin/dfd34e8260c270da74b5c2d86d61aee4b6d56977.
stumski@C-H50K6G3:~$
So, for example, after patching all the p_align value from 0x1000 to 0x2000 (offset 0xE1, 0x119 and 0x151 of gzip 1.10-4ubuntu4), the new binary can also works well.
Actually those were all 0x1000 already. In gzip 1.10-4ubuntu4 I only had to change the value at offset 0x189
using
echo -en '\x10' | sudo dd of=/usr/bin/gzip count=1 bs=1 conv=notrunc seek=$((0x189))
It also happens in WSL2, I have just found it (unable to start VScode because of it)
Patch by @dreamlayers worked beautifully, thanks!
@LostInBrittany Are you absolutely certain that your distribution is running under WSL2? You can list all distros with wsl -l -v
. I ask, because when switching to WSL2 it didn't happen to me.
I tried patch by @dreamlayers on newest gzip and it's working as it should!
@Zenderable OMG, you're right, my bad... I have two Windows 11 computers, both with Windows Insiders, both with WSL, and I was sure both of them used the same WSL version... And not, this one has WSL 1, and the other WSL 2 (and it hasn't the problem, you're right). Sorry about that!
I trigger the same issue. WSL 2 is ok, WSL 1 is wrong.
By reversing
lxcore.sys
, the real reason of this bug is that WSL1 assumes allp_align
member inPT_LOAD
program headers must be the same value, which is not correct. (See elf(5) — Linux manual page, there is not such assumption forp_align
) ...
IDA Pro and Hexrays to the rescue again!
@benhillis It seems like your ELF loader still has some serious issues.
For anyone else trying to run Ubuntu 22.04 LTS in WSL (especially WSL1), such will also likely want the workaround for #7054 too.
I came here from #8151, nodejs from Arch could not run on WSL1.
In my case, node
also has different p_align values in its program headers, but this only happens if nodejs is built after 2022/02/14, when Archlinux upgraded glibc from 2.33 to 2.35. After some git bisect, I find out that GNU ld changed p_align in binutils 2.38, or more specifically binutils-gdb@74e315dbfe5
. Previously, if a section requires alignment higher than max-page-size, it won't affect p_align in corresponding segment. After this commit, if a section requires alignment higher than max-page-size, the required alignment will be set as p_align of this and later segments. In nodejs's case, it has a lpstub
section that is aligned to 2MiB, which caused several LOAD segments having p_align=0x200000.
Here's the python script I used to patch p_align values:
from elftools.elf.elffile import ELFFile # pip install pyelftools
target_p_align = 0x1000
in_file = '/usr/bin/node'
out_file = '/usr/local/bin/node'
with open(in_file, 'rb') as fp:
bdata = bytearray(fp.read())
elf = ELFFile(fp)
header_size = elf.structs.Elf_Phdr.sizeof()
for i in range(elf.num_segments()):
header = elf.get_segment(i).header
if header.p_type == 'PT_LOAD' and header.p_align != target_p_align:
print(f'changing alignment of program header {i} from {header.p_align} to {target_p_align}')
header.p_align = target_p_align
header_offset = elf._segment_offset(i)
bdata[header_offset:header_offset+header_size] = elf.structs.Elf_Phdr.build(header)
with open(out_file, 'wb') as fp:
fp.write(bdata)
So, for example, after patching all the p_align value from 0x1000 to 0x2000 (offset 0xE1, 0x119 and 0x151 of gzip 1.10-4ubuntu4), the new binary can also works well.
Actually those were all 0x1000 already. In gzip 1.10-4ubuntu4 I only had to change the value at offset
0x189
usingecho -en '\x10' | sudo dd of=/usr/bin/gzip count=1 bs=1 conv=notrunc seek=$((0x189))
work!
So, for example, after patching all the p_align value from 0x1000 to 0x2000 (offset 0xE1, 0x119 and 0x151 of gzip 1.10-4ubuntu4), the new binary can also works well.
Actually those were all 0x1000 already. In gzip 1.10-4ubuntu4 I only had to change the value at offset
0x189
usingecho -en '\x10' | sudo dd of=/usr/bin/gzip count=1 bs=1 conv=notrunc seek=$((0x189))
work!
work!
hanejun ~ wsl --list --verbose NAME STATE VERSION
u22_w1 Running 1
hanejun ~ wsl hanejun@kenbishi:~$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04 LTS Release: 22.04 Codename: jammy
So, for example, after patching all the p_align value from 0x1000 to 0x2000 (offset 0xE1, 0x119 and 0x151 of gzip 1.10-4ubuntu4), the new binary can also works well.
Actually those were all 0x1000 already. In gzip 1.10-4ubuntu4 I only had to change the value at offset
0x189
usingecho -en '\x10' | sudo dd of=/usr/bin/gzip count=1 bs=1 conv=notrunc seek=$((0x189))
Thank you man It saved my time
So, for example, after patching all the p_align value from 0x1000 to 0x2000 (offset 0xE1, 0x119 and 0x151 of gzip 1.10-4ubuntu4), the new binary can also works well.
Actually those were all 0x1000 already. In gzip 1.10-4ubuntu4 I only had to change the value at offset
0x189
usingecho -en '\x10' | sudo dd of=/usr/bin/gzip count=1 bs=1 conv=notrunc seek=$((0x189))
It's magic
So, for example, after patching all the p_align value from 0x1000 to 0x2000 (offset 0xE1, 0x119 and 0x151 of gzip 1.10-4ubuntu4), the new binary can also works well.
Actually those were all 0x1000 already. In gzip 1.10-4ubuntu4 I only had to change the value at offset
0x189
usingecho -en '\x10' | sudo dd of=/usr/bin/gzip count=1 bs=1 conv=notrunc seek=$((0x189))
Can confirm this works for me as well.
My build info:
OS Name: Microsoft Windows 10 Home
Version: 10.0.19044 Build 19044
System Type: x64-based PC
WSL Version: 2
Linux Distro: Ubuntu 22.04 (Jammy)
WSL Version: 2
@rattfieldnz This issue is only for WSL 1!!!
@all
Another workaround is to run via ld.so
:
/lib64/ld-linux-x86-64.so.2 /usr/bin/node --version
for example the "fix" for node
:
sudo mv /usr/bin/node /usr/bin/node-orig
printf '#!/bin/sh\nexec /lib64/ld-linux-x86-64.so.2 /usr/bin/node-orig "$@"' | sudo tee /usr/bin/node
sudo chmod a+x /usr/bin/node
FWIW, gzip 1.10-4+deb11u1 on Debian Bullseye and 1.12-1 on Debian Bookworm do not have this issue.
Also a workaround without actually patching the binary:
printf '#!/bin/sh\nexec /lib64/ld-linux-x86-64.so.2 /usr/bin/gzip "$@"' | sudo tee /usr/local/bin/gzip
sudo chmod +x /usr/local/bin/gzip
From this comment on the related Node issue.
has anyone bumped it to windows division? (I havent used windows in ages so i dont really know how) This really ought to be fixed as their linker is faulty and in future will continue to cause problems with newer builds
Microsoft thinks WSL 1 is dead )) Their evangelists use WSL 2 exclusively.
A new gzip
package is available in the jammy-proposed
repo that appears to fix the issue (well, at least revert the optimizations that cause WSL1 to choke). If you have the ability, please follow the instructions in the Launchpad report to test and report your findings.
My understanding is that only one person needs to confirm, but I'd love more eyes on it than just mine.
my workaround was to install a newer gzip version manually
i opened up http://archive.ubuntu.com/ubuntu/pool/main/g/gzip/ and copied the link to gzip_1.12-1ubuntu1_amd64.deb then downloaded the file via curl
curl -fsSL -o gzip_1.12-1ubuntu1_amd64.deb http://archive.ubuntu.com/ubuntu/pool/main/g/gzip/gzip_1.12-1ubuntu1_amd64.deb
and installed it
sudo dpkg -i gzip_1.12-1ubuntu1_amd64.deb
In WSL1 Ubuntu 22.04.1 LTS in Windows 10 22H2 (OS Build 19045.2251), new gzip 1.10-4ubuntu4.1 works fine. I cancelled the apt hold I had and simply allowed it to install automatically.
In WSL1 Ubuntu 22.04.1 LTS in Windows 10 22H2 (OS Build 19045.2251), new gzip 1.10-4ubuntu4.1 works fine. I cancelled the apt hold I had and simply allowed it to install automatically.
I can also confirm this.
It still doesn't work when we install directly the minifs https://cdimage.ubuntu.com/ubuntu-base/releases/22.04/release/ubuntu-base-22.04-base-amd64.tar.gz under WSL1
Workaround is to install the gzip package, but it would be great to have the one in the minifs already fixed.
Should the fixed-in-wsl2 label be added?
It still doesn't work when we install directly the minifs https://cdimage.ubuntu.com/ubuntu-base/releases/22.04/release/ubuntu-base-22.04-base-amd64.tar.gz under WSL1
Workaround is to install the gzip package, but it would be great to have the one in the minifs already fixed.
WSL should be able to run all valid binaries, changing the binaries to work around the bug in WSL doesn't fix WSL...
Is it possible to update lxcore.sys to fix the bug?
please any shell works under windows11?
Version
Microsoft Windows [Version 10.0.19044.1586]
WSL Version
Kernel Version
4.4.0-19041-Microsoft
Distro Version
Ubuntu 22.04 "Jammy Jellyfish"
Other Software
GZip version
1.10-4ubuntu3
and1.10-4ubuntu4
amd64.Repro Steps
apt install gzip=1.10-4ubuntu3
then rungzip
.apt install gzip=1.10-4ubuntu4
then rungzip
.Expected Behavior
No error shows up.
Actual Behavior
The binary doesn't execute, so no
strace
.Diagnostic Logs
Similar to that one, the same binary runs perfectly OK on a native Ubuntu Jammy machine. However, this time the binary is 97520 bytes and no section points outside this range.
Dissected binary using Wireshark: https://paste.ubuntu.com/p/nc2v6ZSRHW/
The previous version
gzip 1.10-4ubuntu1
is fine, so I've installed that one instead and settingapt-mark hold
for now.It's very interesting that only
gzip
is found problematic. And it's the same program as 3 years ago. Time to wonder if gzip has any magic to break on WSL1.