Open snaury opened 10 years ago
Some time ago we have similar questions as I remember on forum sources.ru to our mingw-builds project. But no one solution was found. All that we can do here now is just remove configure parameter. But in this case you can't understand what build do you use.
Removing build parameter wouldn't help, it would simply change the exact string, nothing else.
Ok. Then maybe you can fill bug report for binutils?
echo *.o *.a | xargs -n 1 objcopy.exe -R '.rdata$zzz'
but bleh ...
This may be resolved by passing option -fno-ident
to the compiler.
These ident records not only add spam and increase binary sizes, but more importantly they make it more difficult to create reproducible binaries across platforms. The reproducibility issue could be mitigated by just leaving the ident value to the default in MSYS2 builds — this would help because in Linux builds (that I know of) and macOS (Homebrew) builds, it's also left to its default value (e.g. GCC: (GNU) 7.2.0
).
I really appreciate the work-around, @vszakats but man that is weird that this problem exists at all.
So, -fno-ident removes the .rdata$zzz section, but you need to recompile all the libraries separately in order to get it finally working.
It seems ridiculous that -fno-ident is not set as the default for compiling on Windows then. Can we do something about this ?
I made some quick use of objcopy to remove some of them from libgcc.a, but there are still many still there. (I suppose, I'd have to extract all of the .so files, objcopy them and then reassemble everything).
Well, libgcc.a went from 2.7Mb to 622Kb. So, I am pretty sure it removed way more than just those duplicates and I now have an unusable libgcc library.
Can't this get fixed by having gcc use -fno-ident by default ?
Before moving to MSYS2, I was using mingw-builds. It looks like it is no longer maintained, this is why I moved.
mingw-builds was using -fno-ident by default on every compilation : https://github.com/niXman/mingw-builds/blob/master/library/config.sh
It was kindly implemented by niXman as a follow-up to this report: https://github.com/niXman/mingw-builds/issues/412. Same solution would work here as well I'm guessing.
This request would be more compelling backed by some numbers. I just tested -fno-ident with a C++ project with 26 object files and the .dll size decreased about 1000 bytes... which is 0.1% of the initial size.
In theory, projects such as LLVM or Qt, which consists of thousands of object files, would benefit more from -fno-ident, but actual numbers are better than theory.
Finally, someone (TM) could file a bug report on binutils' tracker mentioning that strip
on Windows does not remove the duplicate ident strings as it does on Linux. Fixing strip
is the best solution, because those strings can be very useful when tracking mismatched libraries.
@oscarfv, actually, on windows, in the COFF format, instead of being in the .ident section, they are inside of the .rdata section (.rdata$zzz subsection) and strip can't actually safely strip it. There are sometimes very important data that is needed by the exe in the .rdata$zzz section which will break an executable if removed.
The problem is not binutils not stripping the section, the problem is gcc adding the .ident tag in the .rdata$zzz section for the COFF windows format (because there is no .ident section in that format).
Even if you told binutils to remove it, it would be technically impossible for it to safetly identify if the data is needed or not. .rdata is "Read-only initialized data" and is defined as needed by the executable. Only the .debug sections are safely removable in the PE COFF format.
https://docs.microsoft.com/en-us/windows/desktop/debug/pe-format
In the case of a simple hello world executable, the string is already present about 20 times in the executable. It is present more than a hundred time in libgcc.a.
Yes, this is still just a few kilobytes, but it is still a few kilobytes too many that will actually be loaded in memory with the rest of the executable.
The -fno-ident is currently the only solution, except if gcc start adding its identifier inside of the .debug section.
Well, my point about reporting the issue upstream stands: is there a bug report on gcc's tracker about this problem?
I believe any data that serves no purpose is best to be eliminated from generated/shipping binaries, if there is way. It this case there is an easy way, with no known drawbacks. The fix is also fit as a temporary solution till slower moving upstream projects such as GCC decides to change the default and/or is able to replicate *nix behaviour on Windows (such that strip
actually removes these strings).
As for numbers it depends on the ident text, for example this text taken from an earlier MSYS2 version GCC: (Rev1, Built by MSYS2 project) 7.2.0
takes 40 bytes plus some, which should be multiplied by the number of object files (build without -fno-ident
) linked to the resulting executable. In case of libiconv2-2.dll
, which is a fairly small file (1 MB), the bloat is 1.6-2K. On my MSYS2 main bin
directory there are 470 files, so it adds up quickly. In the past the most common "complaint" from potential MinGW users (whom I tried to explain the merits of this great toolchain) was the larger output size compared to other C compilers (notably MSVC) on the platform, so I took every opportunity to remove unnecessary bloat. However large/small these may be.
(Since -fno-ident
fully resolved the issue on Windows, I haven't consulted GCC tracker about this.)
Those ident strings are there for a reason: to identify the compiler that built the object file and thus avoid or detect incompatible mixtures of libraries. On MinGW(-w64) this is specially important because there are so many variants out there, with different combinations of exception model, threading model and C++ ABI (too bad that the info does not include compiler options that modify the ABI).
Personally I'm not opposed to add -fno-ident
to the compiler defaults, but the 0.1% of size improvement plus removing the means for checking which compiler built a given library makes me not interested on creating a PR either, which is the best way to advance things on this type of projects.
In my (fairly long) experience with MinGW/GCC, the ident string never contains info such as exception/threading/ABI/build-options, only the packager project in some cases (like with MSYS2) and the version of the GCC compiler. It's highly doubtful this string is actively used by any part of the toolchain by actually reading that information, extracting the version number (or other data) from the freeform text and even acting upon it. I've never seen that happen and it'd be a rather flakey solution indeed, if that's the idea behind it.
LTO builds will reject linking objects with incompatible LTO metadata versions, but that's using an independent mechanism by reading this information from the object files (which information is not included in final executables!).
Even if the toolchain would offer a feature to warn about objects linked together with incompatible compiler properties, it makes no sense to include those properties in the final executables — it's well enough to include them in the object files only.
I can imagine that a human readable GCC version text may be useful for debugging/troubleshooting purposes in some cases, but for such purpose it'd be enough to contain the (or each, if there are multiple) ident text once, as opposed to dozens or hundreds of times. None of these solutions would answer which modules were built with which ident text identifier though, so it's quite a limited input.
That is why in its current form, it's merely and ad/spam text that serves limited if any purpose, and doing so overly verbosely.
If MSYS2 maintainers are interested to see the compiler rev/version in all executables built with their build of the toolchain, it'd be more elegant to include a single version string in one of the objects that is always present in the final executables. Using -fno-ident
on all but a few selected objects would also be an improvement.
But, I still fail to see a compelling reason why the inclusion of this is important on Windows, and not important at all on other OS platforms and with other popular Windows C compilers.
Now I see that the info about exception mechanism and threading model is missing. Well, as I said, make a PR that disables ident
by default and let's see what the maintainer thinks. But, again, 0.1-0.2% space savings is not terribly compelling.
To clarify on the 0.1-0.2% figure (which apparently came from the single example I pulled in above), this depends on how many objects formed the output executable and the total size of it, so it may be easily higher than that. To quote an actual example, BusyBox for Windows claimed to achieve 3% size reduction by enabling this option: https://github.com/rmyorston/busybox-w32/issues/51
Where should I go from there in order to get this fixed ? Should I make a pull request ?
@simondeschenes : That's what I would do. And, as a long term solution, a bug report on GCC bugzilla suggesting to move .ident
to .debug
section for targets that use COFF. Or, best of all, implement the later, send it as a patch to GCC and as a PR here :-)
"D:\src\gcc-5.5.0\gcc\toplev.c"
/* Attach a special .ident directive to the end of the file to identify
the version of GCC which compiled this code. The format of the .ident
string is patterned after the ones produced by native SVR4 compilers.
if (!flag_no_ident)
{
const char *pkg_version = "(GNU) ";
char *ident_str;
if (strcmp ("(GCC) ", pkgversion_string))
pkg_version = pkgversion_string;
ident_str = ACONCAT (("GCC: ", pkg_version, version_string, NULL));
targetm.asm_out.output_ident (ident_str);
} */
"D:\src\gcc-5.5.0\gcc\c-family\c-lex.c"
/* if (!flag_no_ident)
{
Convert escapes in the string.
cpp_string cstr = { 0, 0 };
if (cpp_interpret_string (pfile, str, 1, &cstr, CPP_STRING))
{
targetm.asm_out.output_ident ((const char *) cstr.text);
free (CONST_CAST (unsigned char *, cstr.text));
}
} */
So if all will be happy with currently adding -fno-ident
to build flags then i will do it globally in makepkg
With all the overtime I am currently doing at work, I just never got time to make a pull request. Thank you @Alexpux for taking care of it.
What @Alexpux is proposing is not the same as some people requested here. People complain about .ident
in general, not only in the binary packages distributed by MSYS2/Mingw-w64.
Adding -fno-ident
to makepkg
avoids having those strings in the binaries we download here, but not from the binaries we build. The real fix is to patch binutils for our target. A quick looks points to the function obj_coff_ident Completely untested patch follows:
modified gas/config/obj-coff.c
@@ -553,10 +553,7 @@ obj_coff_ident (int ignore ATTRIBUTE_UNUSED)
/* We could put it in .comment, but that creates an extra section
that shouldn't be loaded into memory, which requires linker
changes... For now, until proven otherwise, use .rdata. */
- sec = subseg_new (".rdata$zzz", 0);
- bfd_set_section_flags (stdoutput, sec,
- ((SEC_ALLOC | SEC_LOAD | SEC_READONLY | SEC_DATA)
- & bfd_applicable_section_flags (stdoutput)));
+ sec = fetch_coff_debug_section();
}
#else
subseg_new (".comment", 0);
MSYS2/Mingw-w64 (and related packages the form the toolchain) are a little bit "more special", because its .ident
strings are inevitably leaking into any project that use this compiler. Most trivially via the startup object, but also via other objects/libs that the C compiler links by default/automatically.
You're right though that the best would be to enable the option for every MSYS2 package build. If MSYS2 has a concept of a default/customisable CFLAGS
/CPPFLAGS
value, it may help solving it.
I also like the what @oscarfv code tries to do. It tries to put the .ident inside of a .debug section. Stripping will remove every .debug section and should also remove the .ident string. However, it keeps the .ident string as part of the debug information.
I think it is a nice idea to explore in order to get a real fix going upstream to binutils.
.debug
section would be a nice place for this information indeed. (Or, for that matter any other section that gets stripped by strip
.)
The patch disables "-fident", which is enabled by default, but leaves it available, along with "-Qy" fident_distable.patch
@simondeschenes
There are sometimes very important data that is needed by the exe in the .rdata$zzz section which will break an executable if removed
Do you know of any resource that might list or document other uses of .rdata$zzz
section besides .ident
?
@simondeschenes
There are sometimes very important data that is needed by the exe in the .rdata$zzz section which will break an executable if removed
Do you know of any resource that might list or document other uses of
.rdata$zzz
section besides.ident
?
IIRC, I used "strip -R .rdata$zzz myobjectfile.o" and it essentially completely broke the objects completely. I then tried to just manually set the complete "rdata$zzz" sections manually to all zeroes with an hex editor as a test and it also broke the object file.
This is where I stopped when I was trying to just strip that section from the object files.
edit: I think it is important to note what happened since I last commented here. We moved to a LLVM/clang custom toolchain instead of a GCC + binutils toolchain because of the much smaller exec size even when using -fno-ident.
I was very surprised when I received a mail with your message. This issue has been open for more than 9 years, almost 10 and is still not fixed...
I guess projects who have strict executable size requirements all moved away from msys2.
IIRC, I used "strip -R .rdata$zzz myobjectfile.o" and it essentially completely broke the objects completely. I then tried to just manually set the complete "rdata$zzz" sections manually to all zeroes with an hex editor as a test and it also broke the object file.
Thank-you for your response. We were thinking to discard the section with a custom linker script as part of our build process - which seemed to work in tests, but really don't know for certain that it is a valid workaround, so was hoping for a confirmation one way or the other.
We use other some other toolchains besides msys2 and seen a similar accumulation of .ident strings due to those toolchain's supporting libraries, plus in our own libraries not having ever thought about -f[no-]ident
until recently.
We use msys2 to help support our build environment but then use a variety of other toolchains (mingw-w64 variants) depending on what's to be targeted.
Any news on this front? I've just installed the latest version of MSYS2, struggled to get the MinGW64 toolchain working due to this issue, and finally managed to statically compile the three ffmpeg libraries I needed in a very lean build only to find that my <500kb DLL has accumulated a total of 17 duplicate .ident strings from them. I really want to prevent this, but it sounds like the only way to do so right now is to build the MSYS2 MinGW64 package from scratch with a patch? Is that correct, and if so will a fix be available soon? Sorry if that's a newb question, but I had to uninstall and reinstall MSYS2 four or five times to get it working at all so I’m hesitant to go through all that if an alternative exists.
It is almost correct.
There is no fix coming here because it there was one, it would already have been done. I mean, adding -fno-ident
to the options when building GCC isn't something that takes a decade.
The GNU Binutils project considers using -fno-ident
to be the correct solution as all sections in the PE COFF format except for the rdata
ones will be loaded in memory. There really is no valid space in the executable format for an ident string.
Using strip
to remove the section hasn't been sucessful for me, so do 't hope it will work.
However, a custom linker script seems to work fine (-T yourscript.lds
).
This is what they do on some projects like:
https://github.com/tomsons26/ts-patches/blob/326db731f7226d9e803feab475777c730688634e/tibsun.lds
And there are mingw-w64 toolchains compiled with -fno-ident
like:
https://github.com/niXman/mingw-builds-binaries/releases
Thank you for responding @simondeschenes. I'm sorry I took so long to reply -- I wanted to test building MingGW with some of the patches mentioned in this issue before responding so I could report back and help others looking for a fix, but unfortunately hot weather, antivirus stupidity, multiple crashes and someone cutting a hedge trimmer cord and shorting the power right when I was in the middle of building has meant that that hasn't happened yet.
It is almost correct.
There is no fix coming here because it there was one, it would already have been done. I mean, adding
-fno-ident
to the options when building GCC isn't something that takes a decade.The GNU Binutils project considers using
-fno-ident
to be the correct solution as all sections in the PE COFF format except for therdata
ones will be loaded in memory. There really is no valid space in the executable format for an ident string.
I'm aware, but -fno-ident
does not prevent libraries compiled with MinGW from filling up with the .ident spam -- especially when they produce a lot of object files like ffmpeg -- which is still a lingering issue. It certainly didn't seem to stop it when I used one of niXman's builds, as helpful as they otherwise are.
Using
strip
to remove the section hasn't been sucessful for me, so do 't hope it will work.However, a custom linker script seems to work fine (
-T yourscript.lds
). This is what they do on some projects like: https://github.com/tomsons26/ts-patches/blob/326db731f7226d9e803feab475777c730688634e/tibsun.ldsAnd there are mingw-w64 toolchains compiled with
-fno-ident
like: https://github.com/niXman/mingw-builds-binaries/releases
Thanks so much for this, this is exactly the kind of direction I needed. I wasn't sure if discarding the .rdata section entirely might cripple the object file, so that's very promising! If all else should fail, do the .ident strings also build up with Clang? My understanding is that Clang doesn't use GNU binutils, but I'm wondering if it suffers a different version of the problem thanks to the awkwardness of the COFF format.
Clang doesn't have this problem because even if it accepts #ident, the only thing it does with the string is throw it away. It supports #ident for compatibility with GCC, but it doesn't keep it in the compiled binary.
Edit: That clang behavior has even been the source of some problems for systems that depended on the ident string being in the compiled object: https://stackoverflow.com/questions/71315990/clang-compiler-and-ident-preprocessor-directive/71316153#71316153
Thank you, that's very helpful. It sounds like the string doesn't end up in the binary or the object code, which is useful to know since I need to supply both for licensing purposes. Interesting that it causes problems for some people. If nothing else, at least we can choose between compilers. It's a shame the COFF format can't handle .ident strings like how ELF can, but we can only work with what we're given, I guess.
Since gcc version 4.8 gcc started to include
.ident
directives in assembly files, which binutils gas converts into strings into.rdata$zzz
for COFF targets (i.e. windows). The problem however that basically every object ends up with a string likeGCC: (Rev6, Built by MSYS2 project) 4.8.2
, and when linking all of them are tucked at the end of the.rdata
section. The problem here is with section flags. On ELF (i.e. linux) the section is added with SEC_MERGE flag, so the resulting binary only contains a handful duplications (when different objects are compiled with different compilers or their versions). However for COFF no such deduplication occurs, which can easily result in long chains of hundreds of identical strings. While I support gcc's desire to advertise itself I worry this is way too excessive.Not sure how to fix this, it's obviously an upstream bug, and wasn't visible because there weren't many comments before. Now there are. And if gas is ever fixed, all libraries would have to be recompiled, since section flags are in those object files.
What do you think? See for example how many of those comments are in gcc.exe (which is simply a big executable with lots of linked object files).