lexxmark / winflexbison

Main winflexbision repository
GNU General Public License v3.0
416 stars 120 forks source link

Visual Studio - win_bison creates tab.h file exceeding compiler limits #67

Closed us-er-name closed 4 years ago

us-er-name commented 4 years ago

Tools: win_flex -V reports:win_flex 2.6.4 win_bison -V reports: bison (GNU Bison) 3.7.1 Microsoft Visual Studio Community 2019, Version 16.7.7

I am new to flex/bison and am setting up parsing of a subset of SQL. I have around 830 tokens entered in my source file.

When I started out with simple C files and using stdio - everything compiled and worked fine. I decided to adapt it to c++. After a very painful migration (can't find good docs) I finally got everything to almost work.

I've narrowed down the issue:

There is a point in the bison generated *.tab.h file where it does a YY_ASSERT(of all tokens) With over 830 tokens, including the ones used internally (like EOF), the assert line is rather long: over 21740 chars! This leads into a compiler error: error C2026: string too big, trailing characters truncated Ha - you just can't truncate line of source code and get away with it!

microsoft docs re: C2026: The string was longer than the limit of 16380 single-byte characters.

bison generated header file: (including YY_ASSERT line I had to comment out

      /// Constructor for valueless symbols, and symbols from each type.
#if 201103L <= YY_CPLUSPLUS
      symbol_type (int tok, location_type l)
        : super_type(token_type (tok), std::move (l))
      {
// The next line had no compiler complaints. I guess I'm just not invoking this routine (yet).
        YY_ASSERT (tok == token::YYEOF || tok == token::YYerror || tok == token::YYUNDEF || tok == ..... (too long to paste here
      }
#else
      symbol_type (int tok, const location_type& l)
        : super_type(token_type (tok), l)
      {
// C2026 error on the next line: Over 21740 chars long. Commenting it out is a workaround.
//        YY_ASSERT (tok == token::YYEOF || tok == token::YYerror || tok == token::YYUNDEF || tok == ..... too long to paste here
      }

My only remedy, at this point, is to compile, edit the header file to comment out that line, and continue - each time.

Any other workarounds for this? It is a bit of a pain when building everything.

GitMensch commented 4 years ago

Hm, do you do something special? Just rechecked with that version, the definition of the assert is #define YY_ASSERT(E) ((void) (0 && (E))) and it is not used in the header at all (and defined in the cpp file).

... But it looks like I use a different command, mine is: ´win_bison.exe -o "%1.cpp" "%1.y"` (running in a cmd-script, getting the basename of the original file as first parameter).

us-er-name commented 4 years ago

It could be some of the "noise" I have in the source file. I also just used the same as you but I might be overriding something in the .y file

This is my first attempt at C++ so I am not quite sure of all that is listed. Here is what I pulled out of my .y file.


%skeleton "lalr1.cc"

%require "3.2"
%language "c++"
%defines

%define api.token.constructor
%define api.value.type variant

%define parse.assert // Commenting out this line has no effect on the C2026 error

%code requires {
  # include <string>
  class Driver;
}

// The parsing context.
%param { Driver& drv }

/* call yylex with a location */
%locations

%define parse.trace
%define parse.error detailed
%define parse.lac full

%code {
# include "driver.h"
}

I searched for the definition found in the generated .tab.h file and found:

#ifndef YY_ASSERT
# include <cassert>
# define YY_ASSERT assert
#endif

This is the definition used, not the one you have shown. The biggest issue is not the YY_ASSERT definition. The problem is using any YY_ASSERT definition. Since the compiler ignores anything past 16380 chars, it never sees the ending ')' of any YY_ASSERT(). This means the macro is not terminated with the needed ')' and the compiler spits out garbage once that happens.

The only good solution (that I see) is to limit the maximum length of the source code lines.

Having said that, I wonder if I'm doing something wrong anyways (and that's why I'm getting this issue).

GitMensch commented 4 years ago

OK, you're using a different skeleton (that's not necessarily bad). The assert definition is then resolved to:

#define assert(_Expression) (void)( (!!(_Expression)) || (_wassert(_CRT_WIDE(#_Expression), _CRT_WIDE(__FILE__), __LINE__), 0) )

and therefore the expression to assert is always expanded to a string literal. (the first path is not taken as YY_CPLUSPLUS is 199711L)

I suggest to recheck on the bison help mailing list (ideally with your parser source included) about "is it done right" and "how to circumvent this issue".

us-er-name commented 4 years ago

Thanks. I'll try the bug-bison forum.

I was thinking about it - It does not matter what is used. The source line is just too long for the compiler. There is no #define magic I can do to work around it. The only line the compiler won't choke on is a comment line.

With what I am throwing at bison, there are only two choices: Bison should either omit the line of code or split the line up when generating it..

Now, if I'm doing something stupid in my files to make it do that - that's another issue. But, bison shouldn't be generating long lines like that.

Considering the C version didn't have this issue, I'm assuming that the C++ code generating feature simply did not take this into account.

Thanks for your quick response and help

GitMensch commented 4 years ago

You're welcome. Please drop an update note when you know more to not produce a loose end. Thanks.

us-er-name commented 4 years ago

Semi-good news.

At the top of my listing I did a change:

%code requires {
  # include <string>
  class Driver;
#undef YY_ASSERT // Added
#define YY_ASSERT // Added
}

The killing of the YY_ASSERT is an obvious "rather not do this" trick but then the compiler does not have to buffer the entire contents of the parameter.

I posted on the bug-forum but this trick works for me right now - with obvious repercussions of course.

I'll repost if I hear more about this.

PS: the macro redefinition must go in the "%code requires" section and not simply "%code". Otherwise it didn't work for me.

us-er-name commented 4 years ago

Ok. They responded with a solution:

Meanwhile, I recommend that you locate the file data/skeletons/variant.hh, and change:

      {
        YY_ASSERT (]m4_join([ || ], m4_map_sep([_b4_type_clause], [, ], [$@]))[);
      }

into

      {]b4_parse_assert_if([[
        YY_ASSERT (]m4_join([ || ], m4_map_sep([_b4_type_clause], [, ], [$@]))[);]])[
      }

twice.

That should eliminate the problem locally until we have something standard.

This has indeed eliminated the issue for the microsoft compiler. (The YY_ASSERT lines are eliminated.)

NOTE: you can not use %define parse.assert Using it will re-introduce the YY_ASSERT code and thus the error will come back.

In the previous version, it did not make any difference regarding that %define. You need this mod to make things work.

Thanks for pointing me in the right direction (bug-bison forum).

GitMensch commented 4 years ago

Bison 3.7.4 was released, solving the original issue.