taocpp / PEGTL

Parsing Expression Grammar Template Library
Boost Software License 1.0
1.94k stars 228 forks source link

/bigobj #164

Closed zhihaoy closed 5 years ago

zhihaoy commented 5 years ago

One project encountered a "regression" with MSVC when upgrading from 2.7.1 to 2.8.0:

https://github.com/Microsoft/vcpkg/pull/6024

The number of sections in a .obj exceed the 2^16 limit. This is totally possible since it's PEGTL, but can we do anything about it?

Option 1. Mention the issue in README and recommend users to turn on /bigobj (in CMake, maybe) Option 2. Turn it on in CMake directly, but this will affect anything that uses target_link_libraries with pegtl. Option 3. Consider this is a regression and to keep certain uses not to exceed the limit.

d-frey commented 5 years ago

We are not using MSVC ourselves, so we need some help here. What is it that MSVC counts as sections? What does the PEGTL do that leads to the problem? From a quick Google search I only saw that /bigobj is the "solution", but it causes some overhead - so I'd like to avoid enabling it for everyone.

zhihaoy commented 5 years ago

It grows proportional to the number of symbols, so it's not uncommon for a heavily template library to hit it in a translation unit that instantiates everything.

nshcat commented 5 years ago

A reason for the section count exploding when using PEGTL is that MSVC implements templates with a special type of section in the object file:

https://stackoverflow.com/questions/1834597/what-is-the-comdat-section-used-for

So when a lot of rules and specializations of them are created it's quite easy to reach the size limit. IMHO, a note in the readme to activate \bigobj should be enough here.

d-frey commented 5 years ago

OK, thanks for the info. And yeah, there probably isn't much we can do about it. Which leaves the question where to put the note about /bigobj. I'll discuss it with @ColinH and we'll update the docs.

ColinH commented 5 years ago

@zhihaoy Could you try using anonymous namespaces in your implementation (.cpp) file for things like the grammar and see whether that reduces the number of symbols on Windows, too?

zhihaoy commented 5 years ago

The relevant file is https://github.com/Microsoft/cppgraphqlgen/blob/master/GraphQLTree.cpp the grammar is https://github.com/Microsoft/cppgraphqlgen/blob/master/include/GraphQLGrammar.h Tried a bunch of things, such as enclosing the grammar in an anonymous namespace, replacing the error_message with const char*, explicit instantiate the parse overloads. Nothing helped. I guess one might have to split the different parse instantiations into different translation units lol.

ColinH commented 5 years ago

That's a bit of a bummer. I didn't expect to run into 16bit limits in 2019. @d-frey Should we mention /bigobj in https://github.com/taocpp/PEGTL/blob/master/doc/Installing-and-Using.md#requirements ?

d-frey commented 5 years ago

I did find a few over-complicated rules in the above linked grammar that can be simpified, I'd like to send a PR but I'm currently fighting UNIX vs. DOS line endings. sigh

Anyways, I guess those changes might not be enough to fix the issue and other users might have larger grammars, so the hint is probably necessary anyways. And your suggestion is a good place to put it, so please go ahead and add it there.

wravery commented 5 years ago

I encountered this in the wravery/cppgraphqlgen 3.0 fork when I took the PEGTL master branch dependency. I've already patched my own CMakeLists.txt to add it in the next version, and I'm planning on cherry picking the backwards compatible changes for a 2.2 release which will be compatible with PEGTL 2.8.

Looks like the vcpkg port for cppgraphqlgen has already been patched to deal with this, once I rev the vcpkg version to 3.0 or 2.2 that should not be necessary anymore.