libfirm / cparser

C99 parser and frontend for libfirm
http://pp.ipd.kit.edu/firm
GNU General Public License v2.0
339 stars 38 forks source link

Ambiguities not correctly resolved & yacc grammar #23

Open RockBrentwood opened 4 years ago

RockBrentwood commented 4 years ago

Under the "tests" directory in C11parser (on GitHub: https://github.com/jhjourdan/C11parser) you will find a range of programs that exercise some of the more arcane ambiguities that many C11/C18 implementations get wrong -- including yours.

False negatives (errors reported by cparser that are not errors) in: atomic.c, c11-noreturn.c, c1x-alignas.c, control-scope.c, function_parameter_scope.c, function_parameter_scope_extends.c, if_scopes.c, loop_scopes.c

False Positives (programs accepted by cparser that have errors) in: bitfield_declaration_ambiguity.c, dangling_else_lookahead.c, dangling_else_lookahead.if.c, parameter_declaration_ambiguity.c, parameter_declaration_ambiguity.test.c

The following also require further examination: aligned_struct_c18.c, atomic_parenthesis.c, bitfield_declaration_ambiguity.ok.c, bitfield_declaration_ambiguity.fail.c, dangling_else_misleading.fail.c

The C11parser contains a parser that correctly resolves the first 2 groups of programs. The lexer and parser are in CAML, but the lexer is readily convertible to C and a YACC version of the parser is also provided.

The companion article describing the situation more fully may be found here: http://gallium.inria.fr/~fpottier/publis/jourdan-fpottier-2016.pdf

The most important points made in it are that scopes are not always contiguous (as illustrated by the scope.c test files above) and resolving variable/typedef name ambiguity requires lookahead (as illustrated in dangling.c, declaration.c).

The easiest way to fix the problems -- in the process significantly simplifying cparser(!) - is to integrate the lexer and yacc grammar into cparser.

RockBrentwood commented 4 years ago

Further notes on the scope non-contiguity issue:

For non-contiguity: a situation such as this (₀ (₁ ₁) (₂ ₂) (₁ ₁) ₀) may occur, where scope #2 is on top of scope #0, not on top of scope #1. This should be contrasted to this situation (₀ (₁ (₂ ₂) ₁) ₀), where scope #2 is on top of scope #1.

This significantly affects the design of the implementation.

In particular, in the first case, scope #1 cannot be discarded after it closes, because it reopens at a later point. In effect, this is a case of (₀ (₁ ₁) (₂ ₂) (₃ ₃) ₀), where scope #3 inherits or is aliased to scope #1. The easiest way to make this possible might be to not delete a scope until its parent closes. Then scope #1 (= scope #3) and scope #2 would only be deleted when scope #0 is closed.

Alternatively, the first case corresponds to the second case (₀ (₁ (₂ ₂) ₁) ₀), where scope #2 is specially directed to skip scope #1 and link directly to scope #0. However, I am not sure if all the cases of non-contiguous scopes can be treated that way. There may be cases, 2 or more levels might have to be skipped, if this method is used to deal with non-contiguity.

michaelforney commented 4 years ago

I don't understand why any of the examples listed as "false positives" should be rejected. The comments in the tests and the paper both seem to indicate that these are valid.

Also, I don't think any of these examples have scopes that close and then re-open. The visibility of certain identifiers is non-contiguous, but that's just because a declaration in an inner scope is shadowing one in the outer scope. Can you provide a specific example that you think has this property?

RockBrentwood commented 4 years ago

I think the follow-up comment made it pretty clear that this is not "shadowing". It's not a case of (0 (1 (2 2) 1) 0), but of (0 (1 1) (2 2) (1 1) 0). (2 2). In the second case, (2 2) does not lie within (1 1) at all, but only in (0 0), while in the "shadowing" first case (2 2) lies within (1 1). That's already the case, for instance, in function_parameter_scope_extends.c.

I assume you've already read the linked reference PDF and saw the discussion and examples in section 2, which clarify the matter better and in much greater depth than I can here. The examples in the test directory are also discussed in greater depth in that section - including the one just mentioned at the end of section 2.1.4 on page 14:7.

As I think about it ... since you're using tree-walking, rather than processing the code sequentially, you should have a much easier time dealing with this than a translator would that tries to go in strict sequential order.

For function_parameter_scope_extends.c typedef long T, U; enum {V} (*f( T T, enum {U} y, int x[T+U]))(T t) { // The last T on the previous line denotes a type! // Here, V, T, U, y, x denote variables: long l = T+U+V+x[0]+y; return 0; }

This is legal C11 equivalent to:

typedef long T0, U0; enum {V} (f( T0 T1, enum {U1} y, int x[T1+U1]))(T0 t) { long l = T1+U1+V+x[0]+y; return 0; } i.e., enum {V} (f( long T, enum {U} y, int x[T+0]))(long t) { long l = T+0+V+x[0]+y; return 0; }

cparser incorrectly reports the function parameter T, enumeration value U in f(...) and fails to recognize the U in "long l = T+U+V+..." as the enumeration constant U. Similarly, if it had been f(...)(U t) instead of f(...)(T t), then that U would have been the typedef, not the enumeration constant, so it would have been equivalent to f(...)(long t).

On 12/28/19, Michael Forney notifications@github.com wrote:

I don't understand why any of the examples listed as "false positives" should be rejected. The comments in the tests and the paper both seem to indicate that these are valid.

Also, I don't think any of these examples have scopes that close and then re-open. The visibility of certain identifiers is non-contiguous, but that's just because a declaration in an inner scope is shadowing one in the outer scope. Can you provide a specific example that you think has this property?

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/libfirm/cparser/issues/23#issuecomment-569395927

RockBrentwood commented 4 years ago

On the second matter...

On 12/28/19, Michael Forney notifications@github.com wrote:

I don't understand why any of the examples listed as "false positives" should be rejected. The comments in the tests and the paper both seem to indicate that these are valid.

... I examined this more closely and re-examined their test files. I misread that. Only the two fail.c examples are cited as false positive test cases, not the whole lists beneath them; and cparser reports errors in both fail.c programs.

However, that means that aligned_struct_c18.c was not actually listed as a false positive test case, but as legal test case. cparser reports errors on it.

The C11 standard does not allow _Alignas to be used for structure members or types. So, this one might actually be an erroneous listing on the part of c11parser. However, I don't know what effect the C18 revision had on this; so it may actually be a false negative case, now.

For now, in my copy of the test case list, I will list the two fail.c examples and the aligned_struct_c18.c example as "fail" cases, and look for clarification from c11parser on c18.c example.

michaelforney commented 4 years ago

To be clear, I am not affiliated with the cparser project, but I do have my own C compiler project, so I am interested in edge cases like these.

I think the follow-up comment made it pretty clear that this is not "shadowing". It's not a case of (0 (1 (2 2) 1) 0), but of (0 (1 1) (2 2) (1 1) 0). (2 2). In the second case, (2 2) does not lie within (1 1) at all, but only in (0 0), while in the "shadowing" first case (2 2) lies within (1 1). That's already the case, for instance, in function_parameter_scope_extends.c.

I assume you've already read the linked reference PDF and saw the discussion and examples in section 2, which clarify the matter better and in much greater depth than I can here. The examples in the test directory are also discussed in greater depth in that section - including the one just mentioned at the end of section 2.1.4 on page 14:7.

Ah, okay. Your original message said that the non-contiguous scope problem was "illustrated by the scope.c test files above", and I did not see any instance of this in local_scope.c. The corresponding section in the paper for that example said that "the region where a declaration is visible is not necessarily contiguous", not that the scope was not contiguous. I had only skimmed the rest of the paper.

I do see the problem with function_parameter_scope_extends.c. The wording of the spec seems to be that the scope of T T is a block scope (so ends at the closing }), and starts at the end of the of the declarator for that identifier, so that would imply that it is still in scope for the T t declaration (i.e. making it invalid), but I could see it interpreted as a non-contiguous scope as well.

Thanks for explaining the issue!

Edit: I guess the point is that you don't know if the identifier has block scope or function parameter scope until you see the open brace, so it is necessarily non-contiguous.

michaelforney commented 4 years ago

However, that means that aligned_struct_c18.c was not actually listed as a false positive test case, but as legal test case. cparser reports errors on it. The C11 standard does not allow _Alignas to be used for structure members or types. So, this one might actually be an erroneous listing on the part of c11parser. However, I don't know what effect the C18 revision had on this; so it may actually be a false negative case, now.

It looks like http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2244.htm#dr_444 is the document that made _Alignas valid in structure declarations.