yhirose / cpp-peglib

A single file C++ header-only PEG (Parsing Expression Grammars) library
MIT License
884 stars 112 forks source link

AST crashes on using optional param in grammar #62

Closed sapgan closed 5 years ago

sapgan commented 5 years ago

For the csv grammar in the given example it core dumps when forming an AST of it

#

CSV grammar based on RFC 4180 (http://www.ietf.org/rfc/rfc4180.txt)

#

file <- (header NL)? record (NL record) NL? header <- name (COMMA name) record <- field (COMMA field) name <- field field <- escaped / non_escaped escaped <- DQUOTE (TEXTDATA / COMMA / CR / LF / D_DQUOTE) DQUOTE non_escaped <- TEXTDATA* COMMA <- ',' CR <- '\r' DQUOTE <- '"' LF <- '\n' NL <- CR LF / CR / LF TEXTDATA <- !([",] / NL) . D_DQUOTE <- '"' '"'

0 0x000000000043267a in std::_Hashtable<std::string, std::pair<std::string const, peg::Definition>, std::allocator<std::pair<std::string const, peg::Definition> >, std::detail::_Select1st, std::equal_to, std::hash, std::detail::_Mod_range_hashing, std::detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::detail::_Hashtable_traits<true, false, true> >::_M_begin (this=0x0)

at /opt/gcc/4.8.3/include/c++/4.8.3/bits/hashtable.h:369

1 0x0000000000429eae in std::_Hashtable<std::string, std::pair<std::string const, peg::Definition>, std::allocator<std::pair<std::string const, peg::Definition> >, std::detail::_Select1st, std::equal_to, std::hash, std::detail::_Mod_range_hashing, std::detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::detail::_Hashtable_traits<true, false, true> >::begin (this=0x0)

at /opt/gcc/4.8.3/include/c++/4.8.3/bits/hashtable.h:455

2 0x00000000004216f6 in std::unordered_map<std::string, peg::Definition, std::hash, std::equal_to, std::allocator<std::pair<std::string const, peg::Definition> > >::begin (this=0x0)

at /opt/gcc/4.8.3/include/c++/4.8.3/bits/unordered_map.h:249

3 0x00000000004237e9 in peg::parser::enable_ast<peg::AstBase > (this=0x7ffe1c236ca0) at cpp-peglib/peglib.h:3254

If the (?) operator is not used, it doesn't crash. Any reason to not use the (?) operator

yhirose commented 5 years ago

@sapgan, thanks for the report. But I didn't see the problem in my test. Could you give me a minimum CSV file that can reproduce the crash problem?

sapgan commented 5 years ago

the calc3 binary in the example folder crashes, but the other 2 binaries run fine. For the csv example, try the following from the lint folder:

./peglint --ast --opt --source a,b --trace ../../grammar/csv.peg compiler version is gcc 6.1 and distro is centos 7.2

yhirose commented 5 years ago

Thanks for the report. Both the calc3 and peglint examples worked on my Mac. The followings are my results.

Since I don't have linux machine right now, I'll try to setup the virtual box with centos to see if the problem occurs on my machine when I have time.

$ ./calc3 1+2*3
+ EXPRESSION
  - TERM/0[NUMBER] (1)
  - TERM_OPERATOR (+)
  + TERM
    - FACTOR/0[NUMBER] (2)
    - FACTOR_OPERATOR (*)
    - FACTOR/0[NUMBER] (3)
1+2*3 = 7
$ ./peglint --ast --opt --source a,b --trace ../../grammar/csv.peg | pbcopy
pos:lev rule/ope
------- --------
0:0 file
0:1   Sequence
0:2     Option
0:3       Sequence
0:4         header
0:5           Sequence
0:6             name
0:7               field
0:8                 PrioritizedChoice
0:9                   escaped
0:10                        Sequence
0:11                          DQUOTE
0:12                            LiteralString
0:9                   non_escaped
0:10                        ZeroOrMore
0:11                          TEXTDATA
0:12                            Sequence
0:13                              NotPredicate
0:14                                PrioritizedChoice
0:15                                  CharacterClass
0:15                                  NL
0:16                                    PrioritizedChoice
0:17                                      Sequence
0:18                                        CR
0:19                                          LiteralString
0:17                                      CR
0:18                                        LiteralString
0:17                                      LF
0:18                                        LiteralString
0:13                              AnyCharacter
1:11                          TEXTDATA
1:12                            Sequence
1:13                              NotPredicate
1:14                                PrioritizedChoice
1:15                                  CharacterClass
1:6             ZeroOrMore
1:7               Sequence
1:8                 COMMA
1:9                   LiteralString
2:8                 name
2:9                   field
2:10                        PrioritizedChoice
2:11                          escaped
2:12                            Sequence
2:13                              DQUOTE
2:14                                LiteralString
2:11                          non_escaped
2:12                            ZeroOrMore
2:13                              TEXTDATA
2:14                                Sequence
2:15                                  NotPredicate
2:16                                    PrioritizedChoice
2:17                                      CharacterClass
2:17                                      NL
2:18                                        PrioritizedChoice
2:19                                          Sequence
2:20                                            CR
2:21                                              LiteralString
2:19                                          CR
2:20                                            LiteralString
2:19                                          LF
2:20                                            LiteralString
2:15                                  AnyCharacter
3:4         NL
3:5           PrioritizedChoice
3:6             Sequence
3:7               CR
3:8                 LiteralString
3:6             CR
3:7               LiteralString
3:6             LF
3:7               LiteralString
0:2*        record
0:3       Sequence
0:4         field
0:5           PrioritizedChoice
0:6             escaped
0:7               Sequence
0:8                 DQUOTE
0:9                   LiteralString
0:6             non_escaped
0:7               ZeroOrMore
0:8                 TEXTDATA
0:9                   Sequence
0:10                        NotPredicate
0:11                          PrioritizedChoice
0:12                            CharacterClass
0:12                            NL
0:13                              PrioritizedChoice
0:14                                Sequence
0:15                                  CR
0:16                                    LiteralString
0:14                                CR
0:15                                  LiteralString
0:14                                LF
0:15                                  LiteralString
0:10                        AnyCharacter
1:8                 TEXTDATA
1:9                   Sequence
1:10                        NotPredicate
1:11                          PrioritizedChoice
1:12                            CharacterClass
1:4         ZeroOrMore
1:5           Sequence
1:6             COMMA
1:7               LiteralString
2:6             field
2:7               PrioritizedChoice
2:8                 escaped
2:9                   Sequence
2:10                        DQUOTE
2:11                          LiteralString
2:8                 non_escaped
2:9                   ZeroOrMore
2:10                        TEXTDATA
2:11                          Sequence
2:12                            NotPredicate
2:13                              PrioritizedChoice
2:14                                CharacterClass
2:14                                NL
2:15                                  PrioritizedChoice
2:16                                    Sequence
2:17                                      CR
2:18                                        LiteralString
2:16                                    CR
2:17                                      LiteralString
2:16                                    LF
2:17                                      LiteralString
2:12                            AnyCharacter
3:2     ZeroOrMore
3:2     Option
3:3       NL
3:4         PrioritizedChoice
3:5           Sequence
3:6             CR
3:7               LiteralString
3:5           CR
3:6             LiteralString
3:5           LF
3:6             LiteralString
+ file/0[record]
  - field/1[TEXTDATA] (a)
  - COMMA (,)
  - field/1[TEXTDATA] (b)
yhirose commented 5 years ago

@sapgan, I just wonder if it happens due to the same reason as #23 and #46. Could you try to link pthread to see if it fixes the crash? If it fixes the problem, I'll update the 'IMPORTANT NOTE' section on README to include 'CentOS' as well.

sapgan commented 5 years ago

Yes that solved the issue. Thanks