Closed chemphys closed 7 years ago
Hi, investigating this kind of weakly identified issues in a very large codebase requires a large amount of time. Sorry I don't have time available to do it.
On Sat, Sep 23, 2017, 19:37 Marc Riera Riambau notifications@github.com wrote:
Hey @zonca https://github.com/zonca ,
This repo compiles well with clang, g++ and intel without optimizzations. However, I run into an error when I optimize the code with optimization (-O1, -O2 or -O3).
icpc: error #10106: Fatal error in /data/software/repo/intel/2017.0.098/compilers_and_libraries_2017.0.098/linux/bin/intel64/mcpcom, terminated by kill signal
Googling a little bit seems that is a memory error, but using the keyword -mcmodel=large also doesn't help. I tried multi-file optimization (-ipo) and it still fails. Any idea? @agoetz https://github.com/agoetz , @darcykimball https://github.com/darcykimball , maybe you know how to fix this?
Thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paesanilab/clusters_ultimate/issues/18, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXYctlyTjqOXRg5vUy23At-PMR9I5utks5slUG8gaJpZM4Phn70 .
Is it a segfault (SIGSEGV)? Can you get a stacktrace?
Is in the compilation process. The error I show in the previous message is everything that appears. Is there any option in icpc to give a full compilation log? I cannot find anything else. How can I get the stacktrace when using make?
From googling, it seems that you can at least make it emit a report of optimizations made; check the man page or somewhere on https://software.intel.com/en-us/node/522791. If something went wrong during optimization the report should give some indication; if the compiler simply ran out of memory, that should be noticeable too (hopefully).
If the compiler didn't spit out a stacktrace, then you maybe just can't get one; this is proprietary software after all, I guess. Clang, for example, spits out a full stacktrace + scripts for submitting a bug report, etc. when a segfault happens. Have you tried checking with Intel customer support? This is the sort of thing where if it's a compiler bug, you'll have to ask about it. For example, if the optimizer is taking way too much memory, it might just be crappy; there's no way for you to fix that.
It could be gnarly, in any case, so customer support forums would be the sanest option to take first.
Thanks @darcykimball for the suggestion of splitting the function. Just a note. I pushed right now the polynomials for the 2b calculation as they will also be for the three-body. If you check the poly-2b-v6x.cpp in src/potential/2b/ , you will see that now the polynomials accept a set of dimers instead of a single dimer. The idea behind this is that then, since the operations are the same for all dimers, we can let icpc optimize and vectorize this part of the code, which is the most consuming one in the 2b evaluation. If we have the function, as you suggested:
foo(double a[1000]) {
t1 = ...
t2 = ...
t3 = ...
t4 = ...
...
t10000 = ...
return ...
}
becomes...
foo1(double a[1000], double t[10000]) {
t[0] = ...
...
t[4999] = ...
}
foo2(double a[1000], double t[10000]) {
t[5000] = ...
...
t[9999] = ...
}
real_foo(double a[1000]) {
double t[10000];
foo1(a, t);
foo2(a, t);
return ...hopeless expression with a bunch of t's...;
}
Will the compiler still be able to optimize them in a loop like:
foo(double a[1000], size_t n) {
for (size_t i =0; i < n; i++) {
t1 = ...
t2 = ...
t3 = ...
t4 = ...
...
t10000 = ...
}
return ...
}
where this becomes
foo1(double a[1000], double t[10000]) {
t[0] = ...
...
t[4999] = ...
}
foo2(double a[1000], double t[10000]) {
t[5000] = ...
...
t[9999] = ...
}
real_foo(double a[1000], size_t n) {
for (size_t i =0; i < n; i++) {
double t[10000];
foo1(a, t);
foo2(a, t);
}
return ...hopeless expression with a bunch of t's...;
}
So, will that loop in foo
be enough or we have to put the loop inside foo1
and foo2
?
You can check https://github.com/chemphys/clusters_ultimate/blob/master/src/potential/2b/poly-2b-v6x.cpp to see what I mean by the loops.
Thanks!!
Couple things:
The way I split it up was arbitrary, since I just wanted to see if splitting it would make it compilable; I chose to pass around intermediate values (all those t# variables) as some array. This should change how things are optimized, e.g. unless you put restrict annotations on array arguments, the compiler won't be able to convince itself that each temporary doesn't change once initialized. There are crude (global vars) and not-as-crude (static one-time alloc, or class member) alternate ways to do this, I think.
If you want it to be optimized more or less the same as before, I'd reckon that you'd just to make sure that the compiler has the same amount of information at the point of optimization as if the function were in a single translation unit, e.g. that all temporaries are really one-time temporaries, pointers don't alias, etc. I'm not familiar with any icpc extensions and such but the info on how to control these things should be out there.
Finally, if I understand what you're asking about where to put loops: I think any decent compiler should be able to optimize what was written above (as is) by inlining foo1 and foo2, and vectorizing as it will. That is, it shouldn't matter which way you write it; statically, the compiler should be able to tell that all this stuff is being done over the same range of indices. First though, just to ask: have you guys checked how much (if at all) the compiler vectorizes operations for these polynomial functions? It'd possibly be counterproductive to write loops in such a way if the compiler doesn't end up leveraging it.
I used something like:
:1,$s/t([0-9]+)/t[\1]/g
in vim, which in English is "replace all occurrences of t followed by at least one digit, by t, [, those digits, and then ]". Sed or perl can do the same thing with about as much typing.
On Mon, Sep 25, 2017 at 3:02 PM, Marc Riera Riambau < notifications@github.com> wrote:
By the way, @darcykimball https://github.com/darcykimball , how do you replace all t??? by t[???] ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paesanilab/clusters_ultimate/issues/18#issuecomment-332026857, or mute the thread https://github.com/notifications/unsubscribe-auth/AMEb63G5xzu4MqgkgAj0shi3gBmDBCLSks5smCLrgaJpZM4Phn70 .
Hey Kevin,
I have not done yet any check. I am working on it right now, but I wanted first to make the code compile with optimizations. I will check the optimization reports and see the timings and then we will see what is faster. I will put it here in github once I have everything in place.
And thanks for the command!
Hey @zonca ,
This repo compiles well with clang, g++ and intel without optimizzations. However, I run into an error when I optimize the code with optimization (-O1, -O2 or -O3).
Googling a little bit seems that is a memory error, but using the keyword -mcmodel=large also doesn't help. I tried multi-file optimization (-ipo) and it still fails. Any idea? @agoetz , @darcykimball , maybe you know how to fix this?
Thanks!