jeffheaton / encog-dotnet-core

http://www.heatonresearch.com/encog
Other
430 stars 150 forks source link

Arithmetic overflow when running 32 CPUs -> Fix for it #90

Open vsalomaki opened 8 years ago

vsalomaki commented 8 years ago

When running the code on a system with 32 (virtual) CPUs, the Process.GetCurrentProcess().ProcessorAffinity overflows 32-bit integer. Example of such system is c4.8xlarge at Amazon,

XeonMan commented 8 years ago

I am getting the same on a dual X7550 system (32 threads)...

Don't think using a Long is the solution, as num is being used to get the processor count, as in the code comments just below:

             // if there is more than one processor, use processor count +1
              if (num != 1)

Suggest this instead:

https://github.com/encog/encog-dotnet-core/issues/89

identical to your

https://github.com/vsalomaki/encog-dotnet-core/commit/336f721adeffa2790249ef9eaec0dc9725c5f069

vsalomaki commented 8 years ago

Actually the "long-solution" does work, because it's casted back to int. Anyway, it seems that this was a duplicate solution +1 for commit

XeonMan commented 8 years ago

It works in that it does not throw an exception, but I'm getting 4294967295 on a 32 processor system if using Long, which I don't think is the intended value.

See here:

http://stackoverflow.com/questions/34780520/encog-dotnet-3-3-multithread-error

ProcessorCount returns 32 correctly, which seems to me is the intended value for the code logic afterwards.

Thoughts?

+1 for commit too

vsalomaki commented 8 years ago

Where do you read that value, from the variable 'num'? How do you read it? I'm not sure if you tested my 'original' fix as is, but your result seems like an overflow or an underflow.

Anyway, the large "number" returned is by MS-design. In fact, it's not really a number, but a bitmask. See: https://msdn.microsoft.com/en-us/library/system.diagnostics.process.processoraffinity(v=vs.110).aspx

ProcessorAffinity: A bitmask representing the processors that the threads in the associated process can run on. The default depends on the number of processors on the computer. The default value is 2^n -1, where n is the number of processors.

When using the Environment.ProcessorCount, we are actually overriding the (possible) processor-count intended for the process.

jeroldhaas commented 8 years ago

It works in that it does not throw an exception, but I'm getting 4294967295 on a 32 processor system if using Long, which I don't think is the intended value.

Actually, it seems it is.

In FSI:

Math.Log(4294967295., 2.);;
val it : float = 32.0

According to this, using int or long will result in incompatibility for the opposing platform (64 or 32 bit). A precompiler definition may be the best answer, here.

jeroldhaas commented 8 years ago

Could you test this to see if it's working as expected?

Proposing:

              if (threads == 0)
              {
#ifdef _WIN64
                var num = (int) (Math.Log(((long) Process.GetCurrentProcess().ProcessorAffinity + 1), 2));
#else
                var num = (int) (Math.Log(((int) Process.GetCurrentProcess().ProcessorAffinity + 1), 2));
#endif

                  // if there is more than one processor, use processor count +1
                  if (num != 1)

Log2 with long looks like it can result in a value of 31, so this may be a bad solution. Use of float does resolve to 32, however, and very well may be a good adaptation.

Something akin to:

var num = (int) (Math.Log(((float) Process.GetCurrentProcess().ProcessorAffinity + 1), 2.0));

If this could be tested as well... I've only 8 cores here.

XeonMan commented 8 years ago

Yeah, I was thinking of:

var num = (int)(Math.Log(((double)Process.GetCurrentProcess().ProcessorAffinity + 1), 2));

Going to try to test all provided solutions on my 32 core system later in the day (no access right now) and I'll post back.

jeroldhaas commented 8 years ago

That extra precision may very well prove to be a good future-forward solution.

I think pending some tests (which you two would have to champion, @XeonMan and @vsalomaki :smile: ), we may have a proper solution.

XeonMan commented 8 years ago

My system (HP DL580G7) can go to 64 cores in the near future (I already have the CPUs... just need 2 extra heatsinks and a couple of extra memory cartridges) to an absolute max of 80 cores (that entails changing all CPUs and memory cartidges; i.e.: expensive!), so I guess we do have to future-proof the solution...

jeroldhaas commented 8 years ago

Beyond 64 cores, CPUs are segmented into groups, so this code block will have to be handled differently (and likely differently per-platform) when core count reaches that scale.

For the time being, I think working with the premise of a 64-core ceiling should suffice for the majority percentile of end-users.

XeonMan commented 8 years ago

Agreed.

I'll get to work on this during the evening and post back.

BTW, I actually mean threads (or virtual processors?) in my case... So that's actually 32 physical cores + 32 HT, or a max of 40 physical + 40 HTs...

XeonMan commented 8 years ago

Just a thought... if we're limiting to a premise of a 64 core ceiling, wouldn't Enviroment.ProcessorCount suffice (because of the 64 core grouping, etc., etc.)?

jeroldhaas commented 8 years ago

Using ProcessorCount would ignore the processor affinities set to the process by the user.

XeonMan commented 8 years ago

Gotcha. Thanks.

vsalomaki commented 8 years ago

I would go with the ProcessorCount-solution as the primary solution for easier code maintenance and readability,32bit and 64bit. (Mind you, there are no constraints about using 64bit numbers in a 32bit system; just a matter of performance)

As the secondary (more optimal?) solution, one should use the ProcessorAffinity as it was (probably) supposed to be user: that is, instead of using Math.Log(), one should use bit shifting.

XeonMan commented 8 years ago

Well, I do run Encog full blast on my machine, so I am in fact using ProcessorCount (as can be seen in my -now- commented code), as I do not set Affinity manually...

However, for the sake of completeness, here is the test proof, as promised:

http://www.7fgr.com/downloads/encog_test.png

And yes, it woiks.

jeroldhaas commented 8 years ago

If you adjust affinity for Encog and another program (a single process app) to run on one of your cores while Encog consumes the rest, does Encog honor those settings using ProcessorCount in the concerned code?

jeroldhaas commented 8 years ago

@XeonMan @vsalomaki The original Encog Java source uses AvailableProcessors, which appears to be equivalent to ProcessorCount. With that in mind, we can assume that the affinity-based code was written to take advantage of the .NET framework's capabilities to follow affinity settings, as the Java core cannot handle threads based on affinity without native calls to the operating system.

I'd really like to avoid removing this feature, and I'm going to assume that @jeffheaton feels similarly, as well. @vsalomaki I agree bitwise operations are indeed a better solution to this, however I haven't yet spent enough time experimenting with the IntPtr / NativeInt returned to obtain the affinity sum. Using Log(2) only somewhat works: it's a bit of a hack.

There are very likely other reasons to keep the use of affinity that I'm unaware of. I'm hopeful that @jeffheaton can shed some light on this matter.

XeonMan commented 8 years ago

@jeroldhaas I don't think ProcessorCount would honor that, as I believe ProcessorCount will return the number of processors as known to the OS, regardless of affinity.

I do believe @jeffheaton really meant having affinity functionality in the code; I made the modification as a quick way to get Encog working and with my specific needs in mind; however, by using ProcessorAffinity, I see no neg impact on my specific needs, as by not setting affinity manually, I get the same functionality. Having said that, and putting aside my particular scenario, I do fully agree with using ProcessorAffinity as was originally intended, albeit the "small" (double) fix.

BTW, I just roasted my home mains powerline today at 6:18a.m., as I was running my DL580G7 and ML370G5 full blast (> 85% CPU) and a PE2950III as a DB server, servicing the former two, plus AC. A 2,000+ watt around-the-clock load plus the rest of the house (say another kWh) which finally toasted my mains powerline before the fuses blew. The fusebox literally went up in flames. I got my house running again, along with the PE2950III by 9:20a.m. I'm getting a new powerline installed during the week (I hope) which should get my full gear up to speed by the end of the week, should we need further testing on a 32 thread system, etc...

jeffheaton commented 8 years ago

Thank you very much for the pull request and investigation. Sorry for the slow response, and I will review/reply soon.