mstorsjo / llvm-mingw

An LLVM/Clang/LLD based mingw-w64 toolchain
Other
1.85k stars 180 forks source link

Spurious issues with accessing files on network mounts #327

Open falhumai96 opened 1 year ago

falhumai96 commented 1 year ago

I have tested a code of my own in the following options:

and my project compiled successfully in all the above targets except the latter. I am getting a bunch of the following errors:

In file included from <some HPP file>.HPP:<some file line number>:
In file included from <some HPP file>.HPP:<some file line number>:
In file included from <some HPP file>.HPP:<some file line number>:
In file included from <some HPP file>.HPP:<some file line number>:
...
<some HPP file>.HPP:<some file line number>:<some column number>: error: #include nested too deeply.

My project is a CMake project. Sometimes running parallel builds many times results in a successful build. Also, I am using a Ninja generator, and I am building directly on Windows.

As this is a private code, I cannot share code snippets here, but I am happy to have a private chat with you in a different chat channel regarding this matter.

falhumai96 commented 1 year ago

Why doesn't it resolve the UID on every request instead of using a cached one?

On Mon., Mar. 13, 2023, 15:37 Martin Storsjö, @.***> wrote:

btw, kudos to @longnguyen2004 https://github.com/longnguyen2004 for hinting at the shared drive problem above 😃!

You'll know that "network path = bad for LLVM" if you've seen #233 https://github.com/mstorsjo/llvm-mingw/issues/233 😁

Indeed, kudos for pointing that out!

Anyway, I've got some progress on figuring it out. The problem seems to be that LLVM trusts what Windows promises to us, but some filesystems don't seem to fulfill this.

I tested a couple of network file systems; when mounting a Samba mount from Linux, there's no problem. When mounting a shared folder in a VM in VMware Fusion, there's no problem. When mounting a shared folder via Microsoft Remote Desktop, there is a problem though - and apparently in shared folders mounted via VirtualBox too.

When LLD reads files, it checks the unique ID of the files, in order to deduplicate the files. On the problematic filesystems, the unique IDs aren't unique (or stable for that matter). See https://learn.microsoft.com/en-us/windows/win32/api/fileapi/ns-fileapi-by_handle_file_information

  • "The identifier (low and high parts) and the volume serial number uniquely identify a file on a single computer".

In a test run with linking with the toolchain in a shared folder, I saw this:

ld.lld: getUniqueID for z:/llvm-mingw-x86_64/x86_64-w64-mingw32/lib/crt2.o returned 16777220/18446708398794096656 ld.lld: getUniqueID for z:/llvm-mingw-x86_64/x86_64-w64-mingw32/lib/crtbegin.o returned 16777220/18446708398730637328 ld.lld: getUniqueID for hello.o returned 1510595989/18577348462919738 ld.lld: getUniqueID for z:/llvm-mingw-x86_64/x86_64-w64-mingw32/lib/libmingw32.a returned 16777220/18446708398706917392 ld.lld: getUniqueID for z:/llvm-mingw-x86_64/lib/clang/16/lib/windows/libclang_rt.builtins-x86_64.a returned 16777220/18446708398932705296 ld.lld: getUniqueID for z:/llvm-mingw-x86_64/x86_64-w64-mingw32/lib/libunwind.dll.a returned 16777220/18446708398780104720 ld.lld: getUniqueID for z:/llvm-mingw-x86_64/x86_64-w64-mingw32/lib/libmoldname.a returned 16777220/18446708398594809872 ld.lld: getUniqueID for z:/llvm-mingw-x86_64/x86_64-w64-mingw32/lib/libmingwex.a returned 16777220/18446708398645280784 ld.lld: getUniqueID for z:/llvm-mingw-x86_64/x86_64-w64-mingw32/lib/libmsvcrt.a returned 16777220/18446708398594809872 ld.lld: skipping enqueueing path for z:/llvm-mingw-x86_64/x86_64-w64-mingw32/lib/libmsvcrt.a ld.lld: getUniqueID for z:/llvm-mingw-x86_64/x86_64-w64-mingw32/lib/libadvapi32.a returned 16777220/18446708398594809872 ld.lld: skipping enqueueing path for z:/llvm-mingw-x86_64/x86_64-w64-mingw32/lib/libadvapi32.a

Note how it returned the same unique ID for libmsvcrt.a and libadvapi32.a as it already had returned for libmoldname.a. So LLVM concludes that libmsvcrt.a is a file that has already been processed, and ignores it.

When rerunning the same command, without changing anything, the unique IDs for the files have changed to other numbers again...

So the implementations of these remote filesystems come up with random, not unique numbers for these files, and LLVM's implementation relies on these filesystem provided unique ID numbers to disambiguate files from each other.

— Reply to this email directly, view it on GitHub https://github.com/mstorsjo/llvm-mingw/issues/327#issuecomment-1467070991, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD7DVCF5OXMGUACUKXKHAPDW36OQZANCNFSM6AAAAAAVP74IZE . You are receiving this because you authored the thread.Message ID: @.***>

mstorsjo commented 1 year ago

Why doesn't it resolve the UID on every request instead of using a cached one?

It does resolve the UID of the files every time it tries to open them. But for since the UID feature of these filesystem mounts seems to be broken, we get essentially random numbers as UID.

LLD deduplicates opened files based on the UID - in systems with potential hardlinks and symlinks, the filesystem UID is meant to tell you exactly whether what you have is the exact same file as another one or not. I'm not sure if this is done for correctness or performance or both.

I believe that Clang uses this to track identities of files too - I wouldn't be surprised if Clang uses the file UIDs for handling #pragma once, for disambiguating which files actually are the same file, and possibly used for many other things too.

When two unrelated files on disk (on the same drive) return the exact same UID, bad things happen.

mstorsjo commented 1 year ago

I filed an upstream bug report about this at https://github.com/llvm/llvm-project/issues/61401 now.

falhumai96 commented 1 year ago

Awesome, thank you @mstorsjo . I will track that issue when it gets resolved.

mstorsjo commented 1 year ago

I am also looking forward to have a fix to this issue as well #222, as this is a critical problem in another project I am also working on and the project supports STD IO with Unicode characters. This bug behaviour was not observed in GCC MinGW toolchain nor MSVC Visual Studio build tools, only in the LLVM MinGW toolchain.

The fix for that particular issue landed in https://github.com/llvm/llvm-project/commit/fcbbd9649ac165aaf7fc7d60b8fef3b23755179a, and the latest nightly build at https://github.com/mstorsjo/llvm-mingw/releases/tag/nightly contains a version with that fix.

falhumai96 commented 1 year ago

Awesome news @mstorsjo !

mstorsjo commented 1 year ago

A fix for the spurious issues on network mounts was finally accepted in https://reviews.llvm.org/D155579, and landed in https://github.com/llvm/llvm-project/commit/02a37547834902ed05fa9c5d1dcc9e31c37e2182. The fix is way too disruptive to make it into the 17.x release series (where the 17.0.0 version is due to be released next week), but should be part of the upcoming 18.x release in 6 months unless some issue appears. The latest nightly build at https://github.com/mstorsjo/llvm-mingw/releases/nightly should contain this fix though, if you want to take it for a spin.

falhumai96 commented 12 months ago

Awesome news @mstorsjo ! Thanks for your help. Greatly appreciated.

falhumai96 commented 12 months ago

I'll close the issue once 18.0.0 is released (I'll also test it out if the issue is fixed or not).

mstorsjo commented 11 months ago

I'll close the issue once 18.0.0 is released (I'll also test it out if the issue is fixed or not).

If possible, could you try out a nightly build as well? Once 18.0.0 is out, in case it actually doesn't fix your issue, it's at least another 6 months before a fix can be out in a release...

falhumai96 commented 11 months ago

@mstorsjo I have just tested the nightly build against LLVM/Clang MinGW 16.0.6, and the issue is clearly fixed (I noticed it in the earlier build but not on the nightly build). Great work. Much appreciated!

Not sure if UCRT vs MSVCRT would matter (highly doubtful), but I noticed that the nightly has only a build for the former so I did not test the latter.

mstorsjo commented 11 months ago

@mstorsjo I have just tested the nightly build against LLVM/Clang MinGW 16.0.6, and the issue is clearly fixed (I noticed it in the earlier build but not on the nightly build). Great work. Much appreciated!

Thanks for testing!

Not sure if UCRT vs MSVCRT would matter (highly doubtful), but I noticed that the nightly has only a build for the former so I did not test the latter.

Yeah, the CRT choice shouldn’t make any difference here. (And I don’t include msvcrt based builds in the nightlies, as that requires quite a bit of extra compilation.)