icsharpcode / ILSpy

.NET Decompiler with support for PDB generation, ReadyToRun, Metadata (&more) - cross-platform!
20.76k stars 3.29k forks source link

Feature request: Support for checked/unchecked code #85

Closed KevinCathcart closed 13 years ago

KevinCathcart commented 13 years ago

This is one of the areas where ILSpy can be better than Redgate's .NET Reflector. Currently neither show checked or unchecked blocks. This feature would be hard to implement well.

The first thing is that is is reasonably important to offer an option to choose whether the decompiled code defaults to checked or unchecked. Most user code is entirely unchecked, or explicit checked blocks or expressions are used where desired. On the other hand most of Microsoft's libraries appear to be compiled with the switch that turns on checking by default. Therefore when viewing them, it is actually more helpful see explicit unchecked blocks or expressions in those places where overflow checking is not used.

Therefore, users mostly interested in Microsoft's code or who use the compiler switch would benefit from an option to default to checked.

Without loss of generality, the rest of this message will assume we are in the 'default unchecked' mode, so I will be talking about adding explicit checked blocks or expressions.

As you know, there are both checked blocks and checked expressions. When a single operation in a larger expression is checked, we would want to use the checked expression format, rather than introduce a temporary variable. However, for some operations like implicit conversions like found in short var = someInt32; using the conv.ovf.i2, we would want an unchecked block, rather than doing: short var = checked( (short) someInt32), since that although correct is rather ugly.

Lastly, in the original source, if there are multiple checked operations in a function, a programmer probably used one large checked block, rather than many small ones. Therefore in decompiled code, we would want to try to use as few checked blocks as possible, merging any checked blocks separated only by checked-neutral operations, and by inserting checked blocks at higher scopes within a function when legal and when doing so would result in fewer total checked blocks/expressions.

This is a very tricky thing to do well, which is likely why other existing decompilers do not support it.

dgrunwald commented 13 years ago

Yes, I've been thinking about how to do this best. I think the initial IL->C# transformation should only annotate the C# operators with the desired overflow checking mode; and a later step would then insert the minimally required checked/unchecked blocks. Finding the absolutely minimal set of required blocks sounds like an interesting problem - it reminds me of the stuff I did for the International Olympiad in Informatics ;) Just from looking over your problem description, it seems that this should be easily solvable in linear time using dynamic programming.

dgrunwald commented 13 years ago

Hmm interesting, that is a completely different solution than I was thinking of. But I don't see how it would work. It seems that you are introducing only 'checked' blocks/expressions, no 'unchecked' ones? That won't produce correct results, as sometimes it is required to use 'unchecked' expressions even if the compiler setting is to use unchecked by default (and sometimes 'unchecked' blocks help you use a lower total number of blocks than with only 'checked' blocks).

Take, for example, "M(a * (b + 1))", where the multiplication is checked, but the addition is unchecked. And it is possible that the optimal solution means you have to enclose a whole "for" statement in a checked block (e.g. because the initializer, condition and iterator all use checked arithmetic), but then use unchecked blocks within the loop body. Using checked+unchecked you need only two blocks, using checked only you would need at least 3 blocks.

My idea would be to treat this as an optimization problem with the following goals:

  1. Use minimum number of checked blocks+expressions
  2. Prefer checked expressions over checked blocks
  3. Make the scope of checked expressions as small as possible
  4. Make the scope of checked blocks as large as possible (where 1. has the highest priority)

So each assignment of checked/unchecked blocks/expressions has a cost of form (#b+#e), where #b is the number of blocks, and #e is the number of expressions. The optimization would look for the lowest cost, where the sorting criteria is (#b+#e, #b). The algorithm would recursively visit all AST nodes, and compute two versions for each node: one where the parent is expected to provide a closing 'checked' block, and one where it is expected to provide a closing 'unchecked' block.

Reasons for the optimization rules: Rule 1 is obvious, keep the number of blocks/expressions down. Rule 2 and 3 mean that 'M(checked(a * 1), checked(a * 2), a + 3);' is preferrable over 'checked { M(a * 1, a * 2, unchecked(a + 3); }'. This is done so that the checked/unchecked keywords are as close to the affected operation as possible. Rule 4 exists so that the checked blocks include any instructions that don't have any arithmetic at all. This is done so that 'checked { int x = a * 1; int y = a * 2; return y; }' is preferred over 'int y; checked { int x = a * 1; y = a * 2; } return y;'

dgrunwald commented 13 years ago

Implemented in 942131b06d30f80e16a71ed379510abe9ea055f6

KevinCathcart commented 13 years ago

I had indeed fogotten about cases where a checked expression would need an unchecked expression inside. If it were not for that, it would have worked just fine. I deleted it, since it would only confuse anybody who came back to read this bug report.