microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.46k stars 822 forks source link

mmap(NULL, ...) on zero-length file fails with ENOEXEC #3451

Closed thorsteneb closed 9 months ago

thorsteneb commented 6 years ago

April Update, 10.0.17134.191 Also tested on 19H1, build 18204

The package "liblmdb" does not work on WSL. This impacts any projects that use lmdb. I've narrowed this down to mmap(NULL,...) on a 0-length file failing with ENOEXEC. This is likely related to #658 , yet distinct enough that the @therealkenc suggested I file a new issue, to avoid confusion.

To reproduce with lmdb:

git clone https://github.com/LMDB/lmdb.git cd lmdb/libraries/liblmdb make test

mtest.c:50: mdb_env_open(env, "./testdb", MDB_FIXEDMAP , 0664): Exec format error Aborted (core dumped) Makefile:61: recipe for target 'test' failed make: *** [test] Error 134

strace shows:

mmap(NULL, 10485760, PROT_READ, MAP_SHARED, 4, 0) = -1 ENOEXEC (Exec format error)

Just before it cores.

Sample code showing just the ENOEXEC, but not going on long enough after that to core:

#include <stddef.h>
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>

int main (void) {
  void *map;
  int fd;

// mmap an empty file, fails with ENOEXEC
// Test case to recreate an issue with lmdb and WSL
// mmap(NULL, 10485760, PROT_READ, MAP_SHARED, 4, 0) = -1 ENOEXEC (Exec format error)
  remove("./testfile");
  fd = open("./testfile", O_RDWR | O_CREAT, 0664);

  map = mmap(NULL, 1024, PROT_READ, MAP_SHARED, fd, 0);
  if (map == MAP_FAILED)
    printf("mmap(NULL,...) of empty file failed, strace shows ENOEXEC\n");
  else {
    printf("mmap(NULL,...) of empty file succeeded, issue fixed\n");
    munmap(map, 1024);
  }

  close(fd);
  remove("./testfile");

// By contrast, with a file that contains some data
  fd = open("./testfile", O_RDWR | O_CREAT, 0664);
  write(fd,"Content",7);

  map = mmap(NULL, 1024, PROT_READ, MAP_SHARED, fd, 0);
  if (map == MAP_FAILED)
    printf("mmap(NULL,...) of non-empty file failed, that's unexpected\n");
  else {
    printf("mmap(NULL,...) of non-empty file succeeded, as expected\n");
    munmap(map, 1024);
  }

  close(fd);
  remove("./testfile");

  return 0;
}

strace of the sample code: mmap-example.strace.txt

strace of liblmdb itself: liblmdb.strace.txt

User voice for this issue lives here: https://wpdev.uservoice.com/forums/266908-command-prompt-console-windows-subsystem-for-l/suggestions/35053207-support-oracle-berkeley-db-and-lmdb

benhillis commented 6 years ago

I'm testing a fix for this now, but it looks like lmdb is hitting another error after this: mtest.c:113: mdb_cursor_get(cursor, &key, &data, MDB_LAST): MDB_NOTFOUND: No matching key/data pair found

I'm trying to determine if this is related to my "fix" or another issue.

therealkenc commented 6 years ago

The major use case for "mmap's problems" is BerkeleyDB. Instead of deep diving lmdb (or boltdb #2304 #3162 or foo-db) you may consider:

sudo apt install tcl tcl-dev   # plus the usual suspects
mkdir bdb && cd bdb
wget http://download.oracle.com/berkeley-db/db-5.3.21.tar.gz
tar xf db-5.3.21.tar.gz
cd db-5.3.21/build_unix
../dist/configure --prefix=/usr/local --enable-compat185 --enable-dbm \
    --disable-static --enable-test --enable-debug --enable-cxx \
    --enable-tcl --with-tcl=/usr/lib
make -j4
tclsh  # in tickle shell after this, most unfortunately
source ../test/tcl/test.tcl  
# run_std     # worthy goal but it's a big test
# r btree     # more reasonable but still a big ask
test042 btree # first random fail (segfault) I hit on WSL, fine on Real Linux

There is some information on running the individual tests here. If you get the BerkeleyDB tests to pass it will squash #3149 #2627 #2852 #2871 (and everything else in the BDB universe). The other foo-db will almost certainly follow suit.

drujd commented 6 years ago

I guess this is related to #902 ? It sure would be nice to see the mmap issues finally fixed...

thorsteneb commented 6 years ago

@drujd Vote on user voice, get your friends to vote. Right now, there's not exactly an outcry to get BerkeleyDB to work, or any other project relying on these mmap features.

https://wpdev.uservoice.com/forums/266908-command-prompt-console-windows-subsystem-for-l/suggestions/35053207-support-oracle-berkeley-db-and-lmdb

drujd commented 6 years ago

The issue has much broader implications, I have voted, but I do not think this is the right thing to vote on. The dysfunct mmap functionality breaks everything that depends on the correct mmap behavior, including bup or building Android from source. Personally I don't really care about BerkeleyDB.

drujd commented 6 years ago

Also, I really think this should be considered a bug, not a feature request.

thorsteneb commented 6 years ago

Could be a combination of bugs and missing features. I don’t understand WSL even nearly well enough to have an informed opinion on that.

I think it’s very likely that supporting Berkeley DB will also “knock down” a range of other related issues. Any DB with the same pedigree will work.

For Android source development - does that still require 32-bit elf support? If so, the uservoice for that is here: https://wpdev.uservoice.com/forums/266908-command-prompt-console-windows-subsystem-for-l/suggestions/13377507-please-enable-wsl-to-run-32-bit-elf-binaries

benhillis commented 6 years ago

@yorickdowne - The distinction of bug vs feature request is fairly arbitrary, we call things bugs that are simple fixes and a feature request would take a lot longer to implement.

I've looked into this and #902 and they share the commonality that both rely on functionality that the Windows kernel does not support which makes the scope of this fix a bit larger.

therealkenc commented 6 years ago

The dysfunct mmap functionality breaks everything that depends on the correct mmap behavior, including bup or building Android from source.

At base the problem is stated in the SetEndOfFile function....

If CreateFileMapping is called to create a file mapping object for hFile, UnmapViewOfFile must be called first to unmap all views and call CloseHandle to close the file mapping object before you can call SetEndOfFile.

This has been ~dysfunct~ differently abled for 25 years. Repeating the sentiment here, and reiterating that this guy was right (he just framed the ask incorrectly), fixing this In Windows means you don't need patches like this for lmdb on Windows (and only for Windows).

The reason Lineage builds don't work is because no one does Lineage builds on Windows. Then someone does an upstream patch like this (which note has no #ifdef _WIN32 because it doesn't need one) and you never hear of it again. Academically, if you look closely at the lmdb work-around, it could be done in a way that doesn't require an #ifdef too. The Linux side of the #ifdef could set the file size to sizehi instead of 0 and close the handle too. Maybe someone will.

Also, I really think this should be considered a bug, not a feature request.

Alright. But by the chosen definition of not-dysfunct, it is a Windows bug contrast lack of Windows feature too. In extremis, ditto ha ha but serious.

thorsteneb commented 6 years ago

Wow. That rabbit hole went deep fast. Thanks, @therealkenc .

fixing this In Windows

That'd be amazing. Your theory that WSL works because 23 years of Cygwin workarounds means not all code triggers https://github.com/Microsoft/WSL/issues/1529 is -- oddly compelling.

You are saying that allowing SetEndOfFile with handles open would then lend itself to also allowing other actions with handles open, such as renaming, and deleting. Do I get that right?

@benhillis , thanks for looking into it! From your perspective, are allowing SetEndOfFile/delete/rename while handles are open and allowing mapping of a zero-length-file things to be tackled in Windows, or through some form of layer in WSL?

therealkenc commented 6 years ago

allowing SetEndOfFile with handles open would then lend itself to also allowing other actions with handles open, such as renaming, and deleting. Do I get that right?

Nah I didn't mean to imply that in any narrow sense. Better to call the two issues (#1529 and mmap's problems) analogous. No magic bullet here. [ed] To illustrate the bigger picture, see #2915 (message) which doesn't conflate with Windows' "pinned handles" xplat problem. You could fix it in WSL, but that doesn't help win32 NodeJS being broken. How do you eat an elephant? One bite at a time.

thorsteneb commented 5 years ago

I've looked into this and #902 and they share the commonality that both rely on functionality that the Windows kernel does not support which makes the scope of this fix a bit larger.

@benhillis Is it realistic to tackle this and #902 as part of 20H1?

daviesalex commented 5 years ago

Edit: this statement is not correct. I can not reproduce this with https://github.com/WhitewaterFoundry/WLE. I'm going to try to do more bisecting here to see if we can figure out what the difference is between working RPM and not working RPM distributions are. My apologies (and thanks to @DarthSpock for pointing this out)

Edit 2: we found that different versions of RHEL have problems. What is interesting is that we were originally trying on an older version, but the folks at https://github.com/WhitewaterFoundry/WLE/issues/20 have reproduced it on a higher version. If I get some time i'll see if I can bisect out the OS versions to prove if it really is that (i.e. a change in RPM or one of its dependencies such as bdb),

--

This behavior has the effect of preventing people running most RPM based distributions such as RHEL/Fedora/CentOS/Scientific Linux under WSL.

therealkenc commented 5 years ago

Have you tried WSLFedoraRemix?

I would be mildly curious to know whether #2852 (rpm ‑‑initdb) reproduces on WSLFedoraRemix. That issue never had usable reproduction steps for whatever it was the OP was doing on their Fedora (for some value of Fedora), but rpm --initdb works on Ubuntu natch. Curious, but not curious enough to actually install WSLFedoraRemix and look. Maybe it's been patched like Alex suggests. Maybe not. Dunno.

sirredbeard commented 5 years ago

It affects WLinux Enterprise distributions (RHEL, SL, CentOS) as well. See https://github.com/WhitewaterFoundry/WLE/issues/20.

therealkenc commented 5 years ago

@benhillis

mtest.c:113: mdb_cursor_get(cursor, &key, &data, MDB_LAST): MDB_NOTFOUND: No matching key/data pair found I'm trying to determine if this is related to my "fix" or another issue.

Ben, at the time back in August 2018 did you manage to track that fail any? Understandable if you didn't, but whatever you found (if anything) might save Hayden some effort.

ytrezq commented 5 years ago

Hey ! It’s starting to makes really long for not being able to update Fedora or install any other packages…

therealkenc commented 5 years ago

Was hoping for some joy on the OP repro in 18890, but 'fraid not.

233   execve("/usr/bin/make", ["make", "test"], 0x7fffcece5ae0 /* 22 vars */) = 0
[blah blah...]
238   openat(AT_FDCWD, "./testdb/data.mdb", O_RDWR|O_CREAT, 0664) = 4
238   fstatfs(4, {f_type=0x53464846, f_bsize=4096, f_blocks=125688063, f_bfree=84551925, f_bavail=84551925, f_files=999, f_ffree=1000000, f_fsid={val=[1, 0]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOATIME}) = 0
238   pread64(4, "", 152, 0)            = 0
238   mmap(NULL, 10485760, PROT_READ, MAP_SHARED, 4, 0) = -1 ENOEXEC (Exec format error)
238   munmap(0x7ffb5ddd0000, 2101248)   = 0

I'm testing a fix for this now, but it looks like lmdb is hitting another error after this: mtest.c:113: mdb_cursor_get(cursor, &key, &data, MDB_LAST): MDB_NOTFOUND: No matching key/data pair found

Didn't make it that far. If that initial Aug 2018 fix attempt is still in the can, and doesn't break anything obvious, it might be worth a git push to see if it gets any further.

microsoft-github-policy-service[bot] commented 9 months ago

This issue has been automatically closed since it has not had any activity for the past year. If you're still experiencing this issue please re-file this as a new issue or feature request.

Thank you!