Optimization - Githubissues

DualBrain commented 5 years ago

I'm not sure if you are aware... so figured I'd bring it to your attention. I see that you are using a lot of CInt(…) calls. If you don't need the behavior (which you most likely don't) of the rounding that CInt() does; you might want to change all of these calls to CInt(Fix(…)). These will then be optimized by the compiler to a single IL operation (Conv.I4). This is a recent enhancement to the compiler. When you make this call using the CInt(Fix(value)) pattern, it will "truncate" and cast the numeric value (dropping the fractional) in the most efficient manner possible.

If you would like assistance in making these changes, let me know and I'd be happy to coordinate toward a pull request.

As a side note, I'd be very interested in how much of a difference this would make as I have found that significant calls to CInt() can indeed add up. This is (or was) something that was a serious performance impact for projects just like yours (emulation). That's why I've been championing for this change for over 5 years while simultaneously fighting against proposals to add new keywords just for this; that's a whole other story. ;-) This process went through three different VB language leads... so it was definitely a process. In the end, it was what I proposed all that time ago and everyone using this pattern (which is widely used) benefits. So now I'm very curious as to whether or not the fight was worth it in a real-world example. Assuming it is, I'd love to share the results with the members of the VB team that helped to gitterdone.

morphx666 commented 5 years ago

Thank you very much for the suggestion! Actually, I wasn't aware of this and I never would've though that adding a double conversion (CInt/Fix) could cause a positive effect.

I will try to apply the changes on some parts of the code to make sure this doesn't break anything and I'll post back with the results.

Of course, if you'd like to create a pull request, that would be awesome!

DualBrain commented 5 years ago

It was the only way that I could think of adding the feature without having to change the language; it helped that it was also a common pattern in wide use. So I was able to (finally) convince the team that this was a "simple" problem of optimization which falls under the category of a "bug fix" so they were able to slide it into the already overwhelmed workload (the reason it took so long was that others in the community were adamant that this required new keywords and other language changes to solve). So a HUGE thank you goes out to the compiler team for getting it done. It does seem counter intuitive; but it does definitely fall under the "it-just-works" category.

I noticed an old blog post on your site regarding .NET Core. As a side note; although this optimization is in the compiler... the current release of .NET Core doesn't have support for this as Fix() isn't part of the VB assembly (yet). They are aware of this and it's coming. With that said, I wouldn't bother (just yet) playing too much with .NET Core; the work involved is far greater than it should be... especially knowing that a solution to the "runtime" problem is just over the horizon. Don't get me wrong, there's a lot to like about .NET Core... it's just still a bit to early.

morphx666 commented 5 years ago

Just did a quick test (using a simple program) and you are absolutely right! Here are the results:

Using CInt Result: 500,000,504 Elapsed: 905 ms

Using CInt/Fix Result: 499,500,767 Elapsed: 755 ms

The only problem is that the results are not the same.

Here's the code for the CInt loop:

For j As Integer = 0 To 10
    k = 0
    For i As Integer = 0 To 1000000
        k += CInt(Math.Sqrt(i) ^ 2 / 1000)
    Next
Next

The CInt/Fix loop is just the same code but with k += CInt(Fix(Math.Sqrt(i) ^ 2 / 1000)).

Looking at the IL code I can see that the version with the CInt/Fix pattern is exactly the same, but it omits a call to [mscorlib]System.Math::Round(float64) after the div opcode:

This is the IL for the CInt version:

IL_0012: ldloc.1
IL_0013: ldloc.3
IL_0014: conv.r8
IL_0015: call float64 [mscorlib]System.Math::Sqrt(float64)
IL_001a: ldc.r8 2
IL_0023: call float64 [mscorlib]System.Math::Pow(float64,  float64)
IL_0028: ldc.r8 1000
IL_0031: div
IL_0032: call float64 [mscorlib]System.Math::Round(float64)
IL_0037: conv.i4
IL_0038: add
IL_0039: stloc.1
IL_003a: ldloc.3
IL_003b: ldc.i4.1
IL_003c: add
IL_003d: stloc.3
IL_003e: ldloc.3
IL_003f: ldc.i4 1000000
IL_0044: ble.s IL_0012

And this is the IL for the CInt/Fix version:

IL_00a1: ldloc.1
IL_00a2: ldloc.s i
IL_00a4: conv.r8
IL_00a5: call float64 [mscorlib]System.Math::Sqrt(float64)
IL_00aa: ldc.r8 2
IL_00b3: call float64 [mscorlib]System.Math::Pow(float64,  float64)
IL_00b8: ldc.r8 1000
IL_00c1: div
IL_00c2: conv.i4
IL_00c3: add
IL_00c4: stloc.1
IL_00c5: ldloc.s i
IL_00c7: ldc.i4.1
IL_00c8: add
IL_00c9: stloc.s i
IL_00cb: ldloc.s i
IL_00cd: ldc.i4 1000000
IL_00d2: ble.s IL_00a1

Does this look right to you?

morphx666 commented 5 years ago

I'm not sure which areas of the code will support this optimization, because discarding the fractional part will most definitely have an adverse impact on some calculations.

DualBrain commented 5 years ago

It might help to bring in another language (C#) to explain the optimization...

In C#:

float a; a = 3.49; int b = (int)a;

In VB:

Dim a = 3.49 Dim b = CInt(Fix(a))

The (int) cast operator literally translates to a Conv.I4 op when compiled; which is a truncate/cast.

To get the same result in VB, the CInt(Fix(value)) pattern needs to be done. The problem with this is that this requires a call to Fix, then to CInt and the overhead that is involved. Fix truncates and CInt casts. The problem with that is that this is not (typically) just two operations... it's several more beyond since CInt will/could also do rounding as well as bounds checking. Which is kind of silly since (when using this pattern) rounding isn't necessary since we've already truncated because of the call to Fix. ;-)

So what you are seeing with removal of the Round is correct as it is no longer necessary.

One of the other things you are seeing is that (as part of the compiler optimization), the team made a few other tweaks around CInt(). If you were to do this in Visual Studio 2012 (I didn't have - at the time - VS 2015 installed when I was testing this); you would get very different results. Instead of Conv.I4, it was something else... don't remember off the top of my head what that was.

With that said, you might try testing to see what would happen if you got rid of the CInt(Fix()) altogether (for this specific example) and switch the divide operator from a slash to a backslash. I'm honestly not sure what that would do... but it would seem to me that this would be optimized in a similar manner. According to the docs...

"Integer division is carried out using the \ Operator (Visual Basic). Integer division returns the quotient, that is, the integer that represents the number of times the divisor can divide into the dividend without consideration of any remainder. Both the divisor and the dividend must be integral types (SByte, Byte, Short, UShort, Integer, UInteger, Long, and ULong) for this operator. All other types must be converted to an integral type first."

That last sentence is interesting... so I'm not really sure what it would do. One would hope that it is optimized. ;-)

Another note is that part of your results is leaving out the bounds checking since you've enabled the optimization in your projects compiler settings. (Something we discussed a while back.)

As for which code will support this optimization... the question that has to be asked is "does the math require 'bankers' rounding". I suspect that this isn't the case for this type of a project... my suspicion is that most (if not all) of these casts should be done the "C" way (which is truncation). In other words, if fake86 was doing an (int) cast, it would translate to CInt(Fix()). This will be true of all C-style languages. Additionally, the optimization will be noticeable more in code that is called repeatedly... so that's where I would focus first.

morphx666 commented 5 years ago

With that said, you might try testing to see what would happen if you got rid of the CInt(Fix()) altogether (for this specific example) and switch the divide operator from a slash to a backslash.

I actually did try it and it did perform the exact same optimization as using CInt(Fix()).

my suspicion is that most (if not all) of these casts should be done the "C" way (which is truncation)

I agree, although there are very few instances (if any) where CInt() is used in the opcodes emulation, but I will run some checks on other areas and see if I notice any improvements, without breaking the code.

morphx666 commented 7 months ago

This has been improved by using UintXX data types. As expected, it has greatly improved performance.

morphx666 / x8086NetEmu

Optimization #13