Closed GoogleCodeExporter closed 9 years ago
Thanks, this came up in issue #74, but it needs its own work item.
Original comment by kim.gras...@gmail.com
on 30 May 2014 at 8:17
I'm working on this.
Original comment by kim.gras...@gmail.com
on 31 May 2014 at 10:13
Here's a patch that doesn't solve the problem :-) I thought I'd start by
refactoring the affected code so as to make this easy to fix.
Previously, the elimination of duplicate includes happened implicitly because
each suggested headers/decls was mapped to a single corresponding include or
forward decl, even if there might be several providing the same symbol. This
had the side-effect that all but the last duplicated include disappeared.
I've now updated CalculateDesiredIncludesAndForwardDeclares to keep multimaps
for these reverse lookups from symbol use to symbol provider. The function now
maintains all providers for a symbol, and ends with explicitly clearing the
desired flag for all but the first one
(ClearDesiredForSurplusIncludesOrForwardDeclares.)
If this is an acceptable direction, I thought I'd modify ClearDesired... in a
separate change to leave pragma-keep-marked headers alone.
CalculateDesiredIncludesAndForwardDeclares is now more complicated, and I've
added support functions here and there. I'm not extremely happy with the
details of all this, but I think the general approach is correct, i.e. to clean
out duplicates as an explicit step.
Feedback welcome.
Original comment by kim.gras...@gmail.com
on 1 Jun 2014 at 7:01
Attachments:
How exactly do you want to indicate that a line should be kept due to IWYU
pragma: keep?
I had an idea to add OneIncludeOrForwardDeclareLine::is_explicitly_kept. Then
during emitting recommendations check line.is_desired() ||
line.is_explicitly_kept(). This way is_desired and is_explicitly_kept can
coexist independently unaware of each other presence. And in
CalculateDesiredIncludesAndForwardDeclares in include_map and fwd_decl_map just
keep an index of the first OneIncludeOrForwardDeclareLine. In this case we
don't need multimap. Unfortunately, this approach gets a little bit ugly in
case of
#include "foo.h"
#include "foo.h" // IWYU pragma: keep
My first idea was to store in include_map the index of the first explicitly
kept OneIncludeOrForwardDeclareLine or just the first line if none is
explicitly kept.
Cleaning duplicates as a separate step is a good approach, unless we need
specifically to collect these duplicates. Currently, introducing multimaps
only to keep the first include looks like an overkill. Can you explain your
next steps a little bit more?
Original comment by vsap...@gmail.com
on 5 Jun 2014 at 3:33
OK, so instead of marking all lines as desired, we just avoid removing
pragma-keep-marked lines irrespective of whether they're desired or not? That
sounds easier, but it spreads the policy decision around -- I think it's nice
to keep the rules for what's desired and why in this function. Then the diff
emitter can just operate based on desired vs present. I don't know if it's that
clear-cut at the moment, but I think it's a healthy direction.
> [...] unless we need specifically to collect these duplicates.
I don't understand what you mean by this.
> Can you explain your next steps a little bit more?
I thought I'd do the following:
- Like you suggest, add a boolean flag to OneIncludeOrForwardDeclareLine,
indicating whether it's marked with "IWYU pragma: keep".
- Check for the flag in the new (awkwardly named)
ClearDesiredForSurplusIncludesOrForwardDeclares method, and just not clear
desired for explicitly kept lines.
I don't love the multimaps either, but I think they express the relationship
correctly -- any use requires a header or a forward decl. We map each such used
header or decl back to any number of written includes or fwd-decls in the file.
Once we have that information, we can make policy decisions for each individual
include/fwd-decl.
We want to make sure we keep the right #include in a case like this, for
example:
#include "foo.h"
Foo from_foo_h;
#include "foo.h"
And actually, in the case below we need to keep both:
#include "foo.h"
Foo from_foo_h;
#include "foo.h" // IWYU pragma: keep
So it might be that we should leave any one #include with a location before all
uses + all explicitly kept ones.
Not sure this makes a good case for multimaps as such, but I find it hard to
separate the concerns of representing the state and applying the rules without
them.
Original comment by kim.gras...@gmail.com
on 5 Jun 2014 at 9:22
Now I see that picking a correct line requires all duplicate lines. And
collecting all lines in multimap is justified.
Patch comments:
- clang complains about missing 'typename' prior to dependent type at
iwyu_output.cc:1414, :1418. I.e. it should be "typename
ContainerType::iterator"
- have you considered using range-based for loops? I am not eager to convert
all for loops in the project to range-based loops right now. But we can start
using them in the new code.
Original comment by vsap...@gmail.com
on 7 Jun 2014 at 1:13
Oh, and maybe in Contains we can use any_of() from <algorithm>?
Original comment by vsap...@gmail.com
on 7 Jun 2014 at 6:33
Thanks for the feedback!
> - clang complains about missing 'typename' prior to dependent type at
> iwyu_output.cc:1414, :1418. I.e. it should be "typename
ContainerType::iterator"
Embarrassing. This has got to be one of MSVC's least helpful
incompatibilities/extensions. Done.
> - have you considered using range-based for loops? I am not eager
> to convert all for loops in the project to range-based loops right
> now. But we can start using them in the new code.
I hadn't but I tried in the new multimap traversal code. Unfortunately it seems
that multimaps and range-based for loops are not friendly because std::multimap
doesn't expose (as far as I can tell) any way to get a range for a specific
key. The equal_range method returns a pair of iterators, which in turn doesn't
qualify as a range :-/
> Oh, and maybe in Contains we can use any_of() from <algorithm>?
Nice, I didn't know about any_of. Done.
I'm not entirely happy with the matches() methods in
OneIncludeOrForwardDeclareLine. Is there a better name? contains? wraps? holds?
denotes?
Do you mind if I commit this refactoring first, and then work on the multiple
pragma-keep based on it?
Original comment by kim.gras...@gmail.com
on 8 Jun 2014 at 10:01
Attachments:
I was thinking about OneIncludeOrForwardDeclareLine::matches() name and noticed
that Contains has linear complexity. I don't know if it matters, but I have to
share this observation with you.
Range-based loops can be easily used instead of Each. And for multimaps some
simple custom ranges can be introduced. We can reuse llvm/ADT/iterator_range.h
or create something similar ourselves.
A few for loops can be modified so that end iterator is calculated only once.
For example, in ClearDesiredForSurplusIncludesOrForwardDeclares
"container.upper_bound(k->first)" is calculated at each cycle loop.
I suggest to rename ClearDesiredForSurplusIncludesOrForwardDeclares to
ClearDesiredForDuplicateIncludesOrForwardDeclares and move it inside namespace
"internal".
> Do you mind if I commit this refactoring first, and then work on the multiple
pragma-keep based on it?
I'll review the patch so it can be committed separately.
Original comment by vsap...@gmail.com
on 8 Jun 2014 at 4:57
Yes, Contains is linear. I think it needs to be, in a sense, because we want to
preserve the order of lines (i.e. we can't sort them and use a binary search).
It might turn into a problem, because Contains will be called as many times as
there are uses in a file, and operate on a vector of length (include lines +
fwd decls). The number of includes & fwd decls will probably stay <100, but the
number of uses feels like it could explode. We may want to revisit this.
I've updated most loops to be range-based instead, and the code is much clearer.
I didn't use range-based for for multimaps, because it never turned out nice.
Instead I use `auto` to get rid of the iterator type names, please take a look.
I now cache the upper_bound in ClearDesiredForSurplusIncludesOrForwardDeclares.
Moved ClearDesired... and Contains into namespace internal. I prefer "Surplus"
over "Duplicate", because duplicates will be allowed once we take pragma keep
into account.
New patch attached with these things fixed.
Original comment by kim.gras...@gmail.com
on 16 Aug 2014 at 9:33
Attachments:
The patch looks good to me. Storing pointers to OneIncludeOrForwardDeclareLine
in vector is quite fragile - adding a new item to the vector can invalidate all
those pointers. But we can live with that, just need to be careful.
You've convinced me to use "Surplus".
Original comment by vsap...@gmail.com
on 23 Aug 2014 at 7:50
The last patch is still the refactoring without solving the issue, correct?
Original comment by rol...@rschulz.eu
on 25 Aug 2014 at 3:44
Roland: yes, that's correct. It should be trivially fixable on top of the
latest patch, but I wanted to isolate the refactoring from the fix. I'll try
and get the refactoring in tonight, and then post a new patch for fixing this
issue.
Original comment by kim.gras...@gmail.com
on 25 Aug 2014 at 6:48
It turned out not to be trivial, but pretty easy. I only just got around to
finishing the patch today.
I'm not entirely happy with how many places this patch touches, but the updated
test passes fine here.
Please take a look.
Original comment by kim.gras...@gmail.com
on 2 Sep 2014 at 7:30
Attachments:
Fixes it for me both in the small test case and the original source file.
Original comment by rol...@rschulz.eu
on 4 Sep 2014 at 12:26
Overall the patch looks good, have some test case questions. How should we
handle the following cases?
#include "foo.h"
#include "foo.h" // IWYU pragma: keep
#include "bar.h" // IWYU pragma: keep
#include "bar.h"
#include "bar.h" // IWYU pragma: keep
Also I was thinking about verifying in tests which exactly line is suggested to
be deleted and I don't have a solution. difflib doesn't support custom line
comparators and it doesn't look like a trivial problem. If you have any ideas,
I'll be glad to here them.
Something to consider - we might have in the future other reasons to keep
surplus includes, not just "keep" pragma. For example, we should keep x-macros
includes. I am comfortable with the current approach, it shouldn't be hard to
extend it to support other reasons to keep surplus includes. Just wanted to
bring it to your attention.
Original comment by vsap...@gmail.com
on 7 Sep 2014 at 5:30
I'm not sure what the policy should be, actually.
Currently we always keep the first include (with or without pragma:keep) and
any subsequent ones with pragma:keep.
It creates slightly surprising behavior in your test cases above (all "foo.h"
are retained, the middle "bar.h" is removed.)
I supposed we could make sure to keep at most as many includes as there are
pragma:keep directives (i.e. remove both the unmarked "foo.h" and the unmarked
"bar.h"), but I'm a little worried about #include order. If there are uses from
"foo.h" between the naked and the kept include, things will start to break.
So I think the current behavior is reasonably conservative and likely to work,
if not optimal. Would you like me to add a separate test case for these?
I haven't given line number checking any thought. I'm not familiar with the
details of difflib and its use here. iwyu_test_util.py seems to pull some
tricks in _NormalizeSummaryLineNumbers, but that's only for a single line; we'd
like to express constraints between lines...
And yes, that's part of the reason for the fuzzy "surplus" term, that more
exceptions will accrue there. I'm way ahead of you :-)
Original comment by kim.gras...@gmail.com
on 7 Sep 2014 at 6:33
Here's a patch with additional tests for the different combinations of keep and
no keep. I'm not sure it adds any additional clarity.
You decide which patch you prefer, I don't have any strong opinions here.
I investigated the line numbers, but I couldn't find a way to specify more
precisely.
Original comment by kim.gras...@gmail.com
on 8 Sep 2014 at 8:02
Attachments:
I think we should keep the first include even if it doesn't have IWYU pragma:
keep. At first I've thought we should remove it, but doing so is confusing.
For example, user sees the second include is removed, adds pragma to keep it
and now the first include is removed. Why adding pragma to one line affects
another line? And it feels like playing whack-a-mole, which is not good. So
current behavior is reasonable.
OK, let's not bother with checking line numbers.
I'll review the test case on weekend, I think not all cases are valuable. For
instance, my earlier "bar.h" example is quite useless.
Original comment by vsap...@gmail.com
on 11 Sep 2014 at 4:13
Why do we keep the first "...-d21.h"? Looks like it is unused and unused
includes should be kept only if they are marked with IWYU pragma: keep.
I suggest the following test cases:
#include "d21.h"
#include "d21.h" // IWYU pragma: keep
#include "d22.h" // IWYU pragma: keep
#include "d22.h"
#include "d23.h"
#include "d23.h" // IWYU pragma: keep
#include "d24.h" // IWYU pragma: keep
#include "d24.h"
use symbol from d21.h
use symbol from d22.h
It's a cartesian product of {first include is kept, second include is kept} x
{include is used, include is not used}.
Original comment by vsap...@gmail.com
on 14 Sep 2014 at 5:30
Thanks for reviewing, excellent question! Yeah, that makes no sense, let me run
this through the debugger.
I'll add tests to cover your suggested matrix, good idea!
Original comment by kim.gras...@gmail.com
on 14 Sep 2014 at 6:19
So here's what happens:
When the preprocessor sees "// IWYU pragma: keep" in MaybeProtectInclude it
calls ReportIncludeFileUse to add a direct dependency between includer and kept
_header file_.
This means _all includes_ of said header are protected, not just the one that
was marked with a keep pragma.
Before my patch, as long as there was one keep pragma, all includes would be
marked desired, and then the duplicates would be stripped, leaving only one
(more or less random) include.
I'll have to think about what behavior we really want here and see if I can
achieve it.
Original comment by kim.gras...@gmail.com
on 14 Sep 2014 at 8:40
I have a patch now that allows keeping of individual include lines. It becomes
a little more convoluted, but could probably be made to work.
But seeing the test cases made me wonder whether this is actually what we want
to do.
The prior policy was that if "IWYU pragma: keep" was added, all includes of the
same header were kept. I'm starting to think that makes more sense.
For comparison, IWYU doesn't attribute uses to individual include lines but to
a header name. All includes of said header name are equally desired, and then
duplicate removal makes sure there's only one left.
I think the same thinking holds for keep-marked include lines, except we
shouldn't remove duplicates. So if an include name is marked "pragma: keep" in
one include line, all other include lines of the same name would be kept as
well, whether they're keep-marked or not.
What do you think?
Original comment by kim.gras...@gmail.com
on 27 Sep 2014 at 6:40
I've thought after your comment #22 that IWYU checks if it is necessary to
include a file, not if it is necessary to have an inclusion directive. And
trying to handle inclusion directives together with files might get ugly
because they are quite different concepts.
You confirm that the logic becomes convoluted. Let's not push in this
direction then and keep it simple. After all, my "d23.h" test case is
contrived. I think in real code when IWYU recommends to remove both #include
"d23.h" lines, user can figure out which one to keep and will add "pragma:
keep" if necessary. So fixing "d23.h" doesn't solve any real problem, only
hypothetical one.
Actionable response: let's use 140keep-additional-test.patch without
"...-d22.h" ("...-d23.h" will be renamed to "...-d22" to keep numbers
consecutive).
As the long-term solution I think we should start distinguishing between
include-as-use-a-module and include-as-paste-content-of-another-file. IWYU is
built to handle the former case and gets confused when encounters the latter.
And we should separate these cases as early as possible. For example, handling
include-as-paste-content-of-another-file when IWYU outputs recommendations is
most likely to be too late.
Original comment by vsap...@gmail.com
on 29 Sep 2014 at 3:36
Thank you for your thoughtful response.
I agree with your conclusion that IWYU reasons in terms of files, not
individual include directives, and I find your observation that we should
separate module-like includes from codegen-like includes very interesting for
the longer term.
I tried to revisit the 140keep-additional-test patch, but I still have some way
to go with it before I'm happy. I'll post back as soon as I have something
reasonable.
Original comment by kim.gras...@gmail.com
on 1 Oct 2014 at 6:45
OK, here goes the simplest diff I've been able to reduce that implements the
keep-all policy.
It introduces a new ReportPragmaKeep method on IwyuFileInfo, so we maintain the
files that want to be kept.
It's a little silly in that it first clears desired for all duplicates, and
then re-sets it, but with all the data structures and type mangling going on
here, this was the clearest I could come up with.
Let me know if this makes sense!
Original comment by kim.gras...@gmail.com
on 1 Oct 2014 at 8:47
Attachments:
I like how simple it is. Constantly changing desired state can look weird at
the first glance (set desired, clear desired, set desired again). But it is
simpler than alternative - trying to determine desired state from the first
attempt. So I am OK with your approach.
Original comment by vsap...@gmail.com
on 5 Oct 2014 at 7:16
r586, whew! Celebration is in order :-)
I'll merge this one to the 3.5 release branch too, right?
Original comment by kim.gras...@gmail.com
on 5 Oct 2014 at 7:40
Yes, sure.
Original comment by vsap...@gmail.com
on 5 Oct 2014 at 7:43
Original issue reported on code.google.com by
rol...@rschulz.eu
on 30 May 2014 at 8:07