michellab / Cluster

This repository is used for tracking any issues regarding the cluster
2 stars 0 forks source link

sire 2016 core dumps on node001-003 #12

Closed jmichel80 closed 7 years ago

jmichel80 commented 8 years ago

julien@node001:~$ ~/sire.app/bin/analyse_freenrg_mbar Illegal instruction (core dumped)

probably because the Opterons are not supported with the compilation procedure. Means cannot use the serial queue to submit sire jobs. Annoying for non GPU apps like nautilus, mbar etc...

ppxasjsm commented 8 years ago

I think this is an AVX problem.

ppxasjsm commented 8 years ago

AVX is not supported but enabled by default. You can compile by disabling this. Though I am a bit confused that I remember only having issues on azuma and not on any of the cluster nodes.

ppxasjsm commented 8 years ago

I can compile a non AVX version, but then I feel there are too many versions anyway already and I feel I have lost track of which is which.

jmichel80 commented 8 years ago

I Am running a version compiled on my home disk.

The issue is that compile-sire.sh probably disabled avx as this was not found on the headnode where compilation was run.

The issue is portability of binaries. We could by default compile with most generic kernels.

We could also say that supporting old opterons is not important and compile a separate avx version to use legacy nodes.

CJW input would be good. For now do not do anything.

Sent from my iPhone

On 7 Oct 2016, at 12:05, ppxasjsm notifications@github.com wrote:

I can compile a non AVX version, but then I feel there are too many versions anyway already and I feel I have lost track of which is which.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

ppxasjsm commented 8 years ago

I think i prepared a AVX and no AVX binaries at some point. I can also never remember a good way to check whether AVX is supported or not and am confused by the errors each time.

ppxasjsm commented 7 years ago

I have installed a module sire/16.1.0_no_avx which should be used for the serial queue.