Open grooverdan opened 2 years ago
Failed to reproduce on Windows 11 22000.795 and Windows 10 - 19044.1826
Our users are reporting this is back with Windows 10 19044.2311.
Still a problem - MDEV-31486:
Did you try without Docker Desktop? Try to run the dockerd service with systemd or manually with dockerd
command in different console.
Did you try without Docker Desktop? Try to run the dockerd service with systemd or manually with
dockerd
command in different console.
Works on Windows 11 Pro 22H2 22621.1848 when running everything under WSL 2 with Ubuntu 22.04. The bug is still there when running latest version of Docker Desktop on Windows itself (can't run Linux container when starting dockerd inside cmd.exe on Windows).
But even when running Docker inside WSL, there is still the rename table error "Tablespace is missing for a table" as mentioned in this issue which points to this one in MariaDB tracker which finally leads to the current issue of WSL.
Hi I am suffering from this, now. Here are my machine's details.
Please fix this. As we use docker contaioners in production, it is important to us that we duplicate the environment as best we can for local development work.
Edition Windows 11 Pro Version 23H2 Installed on 3/06/2023 OS build 22631.2506 Experience Windows Feature Experience Pack 1000.22677.1000.0
I'm using WSL 2
Please let me know if you need any further information from me.
I'm also facing this issue with WSL2 and MariaDB.
May I suggest you try this on a Ubuntu image running on Windows with VirtualBox? Use same version of Ubuntu and Docker, and see if it also fails.
Idea is to try to narrow where problem might be.
May I suggest you try this on a Ubuntu image running on Windows with VirtualBox? Use same version of Ubuntu and Docker, and see if it also fails.
Idea is to try to narrow where problem might be.
If its Virtualbox VM that wouldn't be using WSL. All other container users of MariaDB based on Ubuntu image and native and VM Ubuntu have not reported a problem and the simple ALTER TABLE is executed in CI on Ubuntu based tests daily.
However the following minimal container images have been made and the wsl8443.c file is in the top of the image. wsl8443.c
#include <fcntl.h>
#include <stdio.h>
#include <sys/stat.h>
#include <errno.h>
#include <stdlib.h>
#include <unistd.h>
static const char *dest="/tmp/movedtestfile";
int main(int argc, char **argv)
{
char *fn= "/tmp/testfile";
if (argc > 1)
fn= argv[1];
int fd= open(fn, O_RDWR|O_CREAT|O_EXCL, 0660);
if (fd >= 0)
fprintf(stderr, "file %s opened - fd %d\n", fn, fd);
else
fprintf(stderr, "file %s not opened errno %d", fn, errno);
if (rename(fn, dest))
fprintf(stderr, "file %s not renamed to %s errno %d", fn, dest, errno);
struct stat s;
if (fstat(fd, &s) == -1) {
fprintf(stderr, "error on fstat errno %d\n", errno);
return 1;
}
else
fprintf(stderr, "stat fd %d successful\n", fd);
close(fd);
unlink(fn);
unlink(dest);
return 0;
}
Occurs to me that tmpfs/tmp storage may not be enough hence volume.
https://quay.io/repository/danielgblack/wsl8443?tab=tags
docker volume storage
docker run --rm -v storage:/tmp quay.io/danielgblack/wsl8443:ubunu2204 ./wsl8443
docker run --rm -v storage:/tmp quay.io/danielgblack/wsl8443:ubunu2204 strace ./wsl8443
docker run --rm -v storage:/tmp quay.io/danielgblack/wsl8443:ubunu2004 ./wsl8443
docker run --rm -v storage:/tmp quay.io/danielgblack/wsl8443:ubunu2204 strace ./wsl8443
A reproduction on this would look like "error on fstat errno 2" on output.
This issue was openeed in May of 2022. Is there any chance of having this looked into / resolved?
Issue still persists as of today, 23.04.2024.
Using Docker Desktop v4.29.0, WSL 2 (incl. Windows Subsystem for Linux Update - 5.10.102.2) Edition Windows 11 Pro Version 23H2 Installed 18.06.2023 Build 22631.3447 Experience Windows Feature Experience Pack 1000.22688.1000.0
The issue only occurs when I use bind mounted volumes on an external SSD (bind mount /h/stuff/source to /data). Using regular docker volumes works fine.
Can we have someone look into this please?
Using Docker Desktop v4.29.0, WSL 2 (incl. Windows Subsystem for Linux Update - 5.10.102.2) Edition Windows 11 Pro Version 23H2
Got the similar error.
Ok, finally got around to look at this myself on a windows machine. Can confirm that this (whatever the root cause is) is still happening on
Edition Windows 11 Enterprise Version 23H2 Installed on 10.11.2022 OS build 22631.3672 Experience Windows Feature Experience Pack 1000.22700.1009.0
Last Update: KB5037853
I can confirm the observation from @holzerseb that it only happens if you mount a directory from the host file system into the container, a volume is not sufficient:
$ docker volume create storage
$ docker run --rm -v "storage:/tmp" quay.io/danielgblack/wsl8443:ubunu2204 ./wsl8443
file /tmp/testfile opened - fd 3
stat fd 3 successful
This is likely because volumes are hosted on the "normal" filesystem of the WSL distro hosting docker engine (c.f. https://stackoverflow.com/a/64430683/4087068), which would use a "normal" block device filesystem driver like ext4
.
If using rancher desktop, which also uses moby (the docker container runtime), it's \\wsl$\rancher-desktop-data\var\lib\docker\volumes\storage\_data
. I checked the mountpoint for that and as expected:
$ docker run --rm -v "volume:/tmp" docker.io/bash sh -c "mount | grep /tmp"
/dev/sdf on /tmp type ext4 (rw,relatime,discard,errors=remount-ro,data=ordered)
If you compare that to a mounted path, you'll see that it's using the 9p
filesystem driver instead:
$ mkdir hostdir
$ docker run --rm -v "$($(pwd).Path)\hostdir:/tmp" docker.io/bash sh -c "mount | grep /tmp"
C:\ on /tmp type 9p (rw,dirsync,noatime,aname=drvfs;path=C:\;uid=0;gid=0;metadata;symlinkroot=/mnt/,mmap,access=client,msize=65536,trans=fd,rfd=4,wfd=4)
And running the reproducer above indeed results in the expected "error on fstat errno 2". The full strace I get there is:
$ mkdir hostdir
$ docker run --rm -v "$($(pwd).Path)\hostdir:/tmp" quay.io/danielgblack/wsl8443:ubunu2204 ./wsl8443
file /tmp/testfile opened - fd 3
error on fstat errno 2
$ docker run --rm -v "$($(pwd).Path)\hostdir:/tmp" quay.io/danielgblack/wsl8443:ubunu2204 strace ./wsl8443
execve("./wsl8443", ["./wsl8443"], 0x7ffe38267590 /* 3 vars */) = 0
brk(NULL) = 0x55b6b28d7000
arch_prctl(0x3001 /* ARCH_??? */, 0x7fff8d629830) = -1 EINVAL (Invalid argument)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0f1586a000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=8735, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 8735, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f0f15867000
close(3) = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\237\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
pread64(3, "\4\0\0\0 \0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0"..., 48, 848) = 48
pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0\244;\374\204(\337f#\315I\214\234\f\256\271\32"..., 68, 896) = 68
newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=2216304, ...}, AT_EMPTY_PATH) = 0
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
mmap(NULL, 2260560, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0f1563f000
mmap(0x7f0f15667000, 1658880, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x28000) = 0x7f0f15667000
mmap(0x7f0f157fc000, 360448, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x7f0f157fc000
mmap(0x7f0f15854000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x214000) = 0x7f0f15854000
mmap(0x7f0f1585a000, 52816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f0f1585a000
close(3) = 0
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0f1563c000
arch_prctl(ARCH_SET_FS, 0x7f0f1563c740) = 0
set_tid_address(0x7f0f1563ca10) = 9
set_robust_list(0x7f0f1563ca20, 24) = 0
rseq(0x7f0f1563d0e0, 0x20, 0, 0x53053053) = 0
mprotect(0x7f0f15854000, 16384, PROT_READ) = 0
mprotect(0x55b6b1a7f000, 4096, PROT_READ) = 0
mprotect(0x7f0f158a4000, 8192, PROT_READ) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
munmap(0x7f0f15867000, 8735) = 0
openat(AT_FDCWD, "/tmp/testfile", O_RDWR|O_CREAT|O_EXCL, 0660) = 3
write(2, "file /tmp/testfile opened - fd 3"..., 33file /tmp/testfile opened - fd 3
) = 33
rename("/tmp/testfile", "/tmp/movedtestfile") = 0
newfstatat(3, "", 0x7fff8d629840, AT_EMPTY_PATH) = -1 ENOENT (No such file or directory)
write(2, "error on fstat errno 2\n", 23error on fstat errno 2
) = 23
exit_group(1) = ?
+++ exited with 1 +++
So given the reproducer works, there are two options:
Hopefully this helps @grooverdan and/or the WSL team to narrow down the issue.
Edit:
$ wsl -v
WSL-Version: 2.1.5.0
Kernelversion: 5.15.146.1-2
WSLg-Version: 1.0.60
MSRDC-Version: 1.2.5105
Direct3D-Version: 1.611.1-81528511
DXCore-Version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows-Version: 10.0.22631.3672
As for what the behavior should be. The man
page for rename(2)
is relatively clear on this:
Open file descriptors for oldpath are also unaffected.
So this should not happen and is indeed some sort of WSL bug. Interestingly, the POSIX definition of rename
makes no such commitment: https://pubs.opengroup.org/onlinepubs/9699919799/functions/rename.html
One more datapoint:
I was curious and tried the reproducer on a directory mounted with 9p
in WSL (i.e. same kernel version => same vfs implementation) against the https://github.com/chaos/diod 9P server and got
$ ./wsl8443
file /tmp/9c/testfile opened - fd 3
stat fd 3 successful
My takeaway from this is that it's probably a bug with the 9P server in windows.
EDIT: If someone wants to try:
sudo apt install diod
sudo mkdir /srv/9p
sudo diod -f -d 1 -n -e /srv/9p
sudo diodmount -n localhost:/srv/9p /tmp
wsl8443.c
cc wsl8443.c -o wsl8443
(obviously needs a C compiler, e.g. sudo apt install gcc
)./wsl8443
sudo umount /tmp
Since it seems to be specific to drvfs
and not 9p
directly, further testing reveals that just using /mnt/c/Users/{username}/testfile
and /mnt/c/Users/{username}/movedtestfile
with the reproducer fails with the (un)expected:
$ ./wsl8443
file /mnt/c/Users/{username}/testfile opened - fd 3
error on fstat errno 2
This is using a "normal" Ubuntu WSL distro opened with wsl -d Ubuntu
, no Docker mounts needed.
EDIT: Sorry for spamming y'all with emails. Thought I was done with this, but there's always something more to look at.
EDIT 2: https://learn.microsoft.com/de-de/archive/blogs/wsl/wsl-file-system-support#interoperability-with-windows-1 feels relevant for this case. The reason I'm talking about drvfs at all is that the aname
of the 9p mount for both the docker case and /mnt/c
is something like aname=drvfs;path=C:\;uid=0;gid=0;metadata;symlinkroot=/mnt/
Version
Windows 11 Pro 22H2 22621.1848 Windows 11 Pro 22631.2506
Microsoft Windows [Version 10.0.19044.1706] Windows 10 19044.2311
WSL Version
Kernel Version
5.10.102.1
Distro Version
Ubuntu 20.04
Other Software
Docker Desktop (Windows), version 4.8.2
Repro Steps
At a high level:
The calling of the system calls in this sequence:
As the fstat is on a open file descriptor it should return successful and the information about the file opened in the first step.
This sequence works fine on all Linux, BSD, Solaris systems that MariaDB has been running on. An
ALTER TABLE
is all that's required to trigger this and there's no non-WSL systems affected.This was reproduced with:
docker run --env MARIADB_ROOT_PASSWORD=bob --env MARIADB_DATABASE=test --env MARIADB_USER=test --env MARIADB_PASSWORD=test --name m107 -d mariadb:10.7.3
docker exec -ti m107 mariadb -u test -ptest test
Past the following SQL on the prompt
To get a strace:
--cap-add SYS_PTRACE
todocker run
command linedocker exec m107 sh -c 'apt-get update && apt-get install -y strace'
docker exec m107 strace -f -p1 -s 99 -o /var/lib/mysql/m.strace
Expected Behavior
All SQL successful and no server crashes.
show create table player_report
should show the result of the table alterations and created indexes.Actual Behavior
mariadb-crash-1.log
(evidently wollysocial was the original database name)
Diagnostic Logs
Original bug report: https://jira.mariadb.org/browse/MDEV-28580
extract from mariadb-1.strace
Note the preallocation ERROR isn't fatal (
fallocate
not implemented in WSL, and we do a write based fallback`).My analysis:
In the strace there is no closing of file descriptor 50.
So its looking like the final fstat on an open file descriptor is loosing track of the file because it was renamed earlier. Without this fstat there is a resulting in a -1 filesize (from the error) and the assertion.
Procmon trace from same time as strace: LOGFile3.PML.