llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.32k stars 12.11k forks source link

C++ + altivec + modules + context-sensitive 'vector' token fails due to overenthusiastic typo correction #16997

Open ec04fc15-fa35-46f2-80e1-5d271f2ef708 opened 11 years ago

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 11 years ago
Bugzilla Link 16623
Version trunk
OS Linux
CC @DougGregor

Extended Description

"clang -x c++ -faltivec -target powerpc64-linux-gnu -fmodules -fcxx-modules" fails with:

:1:12: error: use of undeclared identifier 'vector'; did you mean 'vec_or'? void f() { vector int vi; } (Reduced from enabling modules in test/SemaCXX/altivec.cpp.) There are at least two bugs here. 1) We should not typo-correct 'vector' (or 'pixel' or 'bool') when in Altivec mode, because we may not have detected that it's a context-sensitive keyword yet (this happens for instance in ParseStatementOrDeclarationAfterAttributes). 2) This typo correction is triggered when we pull the identifier 'vec_or' from a modular implicit include of "altivec.h". It is not triggered when we pull the identifier 'vec_or' from a local lookup. So something is behaving differently in the presence of modules. This probably indicates the non-modular typo-correction codepath is missing something, since the modular codepath finds more correction opportunities. [3) We shouldn't be issuing this typo correction anyway: we shouldn't correct to a declaration name when the next token is a declaration specifier keyword (maybe any keyword?) or an identifier.]
llvmbot commented 11 years ago

2) My guess from looking at altivec.h is that typo correction is using vec_or as a viable correction in modules mode because it doesn't have enough information about vec_or to realize it is a function.

I'm not sure about that. We should know vec_or is a function with or without modules enabled. Note that typo correction does not perform the kind of filtering you describe in this case (without -faltivec):

$ echo 'void vec_or(); void g() { vector int n; } ' | ./build/bin/clang -x c++ -

:1:27: error: use of undeclared identifier 'vector'; did you mean 'vec_or'? void vec_or(); void g() { vector int n; } ^~~~~~ vec_or [snip] Under some circumstances, we replace the tok::identifier for 'vector' with a tok::kw___vector in altivec mode; it's possible that happens before typo-correction is applied in the non-modules case?

I'll have to dig into the code path, but I think I know the problem with the correction candidate: the code path that is typo-correcting 'vector' isn't using a CorrectionCandidateCallback that excludes declarations (or even function declarations in particular) and which doesn't try the correction after running Sema::CorrectTypo before suggesting it. Which IIRC means it's probably a relatively short call-path from the parser and likely via some generic function used along several paths that cannot assume a (function) declaration is not an acceptable correction.

As for why the non-modules case works, the tok::identifier being replaced with tok::kw___vector before typo correction is triggered sounds reasonable to me. Of course, even with typo correction fixed there is still the issue of why the error path is being hit with modules enabled. Sema::CorrectTypo doing something dumb when asked to typo correct the 'vector' in 'vector int vi;' in Altivec mode is a secondary problem. (I'll still be adding this bug to my queue, to track down and try to fix having function names being treated as acceptable corrections in a statement position/context that clearly can't accept a function name.)

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 11 years ago

2) My guess from looking at altivec.h is that typo correction is using vec_or as a viable correction in modules mode because it doesn't have enough information about vec_or to realize it is a function.

I'm not sure about that. We should know vec_or is a function with or without modules enabled. Note that typo correction does not perform the kind of filtering you describe in this case (without -faltivec):

$ echo 'void vec_or(); void g() { vector int n; } ' | ./build/bin/clang -x c++ -

:1:27: error: use of undeclared identifier 'vector'; did you mean 'vec_or'? void vec_or(); void g() { vector int n; } ^~~~~~ vec_or [snip] Under some circumstances, we replace the tok::identifier for 'vector' with a tok::kw___vector in altivec mode; it's possible that happens before typo-correction is applied in the non-modules case? > 3) [...] Given the differing behavior > between modules and non-modules, I'm not sure if vec_or being considered a > viable suggestion is a typo correction issue or a modules issue, but lean > towards the latter. As noted above, this is not a modules issue.
llvmbot commented 11 years ago

A few quick comments to your three items:

1) Typo correction is happening on 'vector' et. al. because Sema::CorrectTypo does not know that they are ever keywords; see for example AddKeywordsToConsumer in SemaLookup.cpp (starts around line 3602). AddKeywordsToConsumer should possibly be modified to as them as keywords if Altivec is enabled.

2) My guess from looking at altivec.h is that typo correction is using vec_or as a viable correction in modules mode because it doesn't have enough information about vec_or to realize it is a function. I haven't looked at the code paths being hit, but I'm 99% certain the non-modular typo correction is seeing that vec_or is a function and the CorrectionCandidateCallback given to Sema::CorrectTypo is excluding it, whereas in the modular code path it likely doesn't know that vec_or is a function so is not excluding it (typo correction is pretty smart nowadays about things like not suggesting identifiers it knows refer to functions when it needs a type name, modifier, or other keyword ;).

3) See #​2. Typo correction has become a lot better about trying to reject correction candidates that are inappropriate for the current context, by way of the CorrectionCandidateCallback derivatives. I'm not familiar with how modules are supposed to work, but my guess is that vec_or isn't known to be a FunctionDecl at the time of the correction... or if it is, then using modules is causing Sema::CorrectTypo to be called from a very different context that is passing a different CorrectionCandidateCallback (one that isn't ruling out function declarations). Given the differing behavior between modules and non-modules, I'm not sure if vec_or being considered a viable suggestion is a typo correction issue or a modules issue, but lean towards the latter.

You also only gave a partial invocation for reproducing the problem; it looks like the complete details for reproducing it are:

$ cat tmp.cpp void f() { vector int vi; }

$ ./bin/clang -x c++ -faltivec -target powerpc64-linux-gnu -fsyntax-only tmp.cpp tmp.cpp:1:12: error: use of undeclared identifier 'vector'; did you mean 'vec_or'? void f() { vector int vi; } ^~ vec_or

[note and cascaded errors here]