iqbal-lab-org / gramtools

Genome inference from a population reference graph
MIT License
92 stars 15 forks source link

Illegal instruction crash during build #108

Closed iqbal-lab closed 2 years ago

iqbal-lab commented 6 years ago

I'm just running gramtools build on some plasmodium data

pwd /nfs/research1/zi/projects/gramtools/standard_datasets/pfalciparum/pf3k_release3_cortex_plus_dblmsps

I ran this

ebi-login-002.ebi.ac.uk> bsub -P bigmem -E 'test -e /homes/zi' -R "select[mem>600000] rusage[mem=600000]" -M600000 -o pfbuild.o -e pfbuild.e -J pfbuild singularity exec /nfs/research1/zi/mhunt/Cryptic_production/clockwork_container.v0.5.1.img gramtools build --gram-directory ./gram --vcf prg_construction/data/pf3k_and_DBPMSPS1and2.vcf --reference reference/Pfalciparum.genome.fasta --max-read-length 150

This output went to stderr

/ebi/lsf/ebi-spool/02/1524041567.7290813: line 8: singularity: command not found
2018-04-18 12:23:43,716 gramtools    INFO     Start process: build
2018-04-18 12:25:16,148 gramtools    INFO     stdout:

2018-04-18 12:25:16,243 gramtools    INFO     Process termination message:
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LANG = "en_GB.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

2018-04-18 12:25:16,244 gramtools    INFO     Process termination code: 0
2018-04-18 12:25:16,886 gramtools    INFO     Process termination message:
Illegal instruction

2018-04-18 12:25:16,886 gramtools    INFO     Process termination code: 132
2018-04-18 12:25:16,886 gramtools    ERROR    Error code != 0
2018-04-18 12:25:30,001 gramtools    INFO     End process: build

And this is the content of the build report

{
    "start_time": "1524054223",
    "end_time": "1524054329",
    "total_runtime": 106,
    "kmer_size": 15,
    "max_read_length": 150,
    "prg_build_report": {
        "command": "perl /usr/local/lib/python3.6/dist-packages/gramtools/utils/vcf_to_linear_prg.pl --outfile ./gram/prg --vcf /nfs/research1/zi/projects/gramtools/standard_datasets/pfalciparum/pf3k_release3_cortex_plus_dblmsps/prg_construction/data/pf3k_and_DBPMSPS1and2.vcf --ref /nfs/research1/zi/projects/gramtools/standard_datasets/pfalciparum/pf3k_release3_cortex_plus_dblmsps/reference/Pfalciparum.genome.fasta",
        "return_value_is_0": true,
        "stdout": [
            "Finished printing linear PRG. Final number in alphabet is  3216844"
        ]
    },
    "gramtools_cpp_build": {
        "command": "/usr/local/lib/python3.6/dist-packages/gramtools/bin/gram build --gram ./gram --kmer-size 15 --max-read-size 150 --max-threads 1",
        "return_value_is_0": false,
        "stdout": []
    },
    "current_working_directory": "/nfs/research1/zi/projects/gramtools/standard_datasets/pfalciparum/pf3k_release3_cortex_plus_dblmsps",
    "paths": {
        "project": "./gram",
        "vcf": "/nfs/research1/zi/projects/gramtools/standard_datasets/pfalciparum/pf3k_release3_cortex_plus_dblmsps/prg_construction/data/pf3k_and_DBPMSPS1and2.vcf",
        "reference": "/nfs/research1/zi/projects/gramtools/standard_datasets/pfalciparum/pf3k_release3_cortex_plus_dblmsps/reference/Pfalciparum.genome.fasta",
        "prg": "./gram/prg",
        "encoded_prg": "./gram/encoded_prg",
        "variant_site_mask": "./gram/variant_site_mask",
        "allele_mask": "./gram/allele_mask",
        "fm_index": "./gram/fm_index",
        "perl_generated_vcf": "./gram/perl_generated_vcf",
        "perl_generated_fa": "./gram/perl_generated_fa",
        "build_report": "./gram/build_report.json"
    },
    "path_hashes": {
        "vcf": "22214596d1f5646c29c2620add6645982989df8c32e535a366619c0e7cfd3791",
        "reference": "1357752e6a2153b64b923a7d47b647ebf7f9aeb955a31c2de291e51ad589c04f",
        "prg": "29b3a29ca7cd7778cab9b469423bdbf4f9ffe92ea11ccfe4348cd807c1d864f1",
        "perl_generated_vcf": "c006c4410dc6ffab7114072344e1410a49c83e92b155c33462aaefd3875dca87",
        "perl_generated_fa": "917c5a4812dc5ccc991b8858bd539e1a0be5cf26d40736b2db3be729f06d6568"
    },
    "version_report": {
        "version_number": "0.5.0",
        "last_git_commit_hash": "63c175a20f9477b270f900addd5aa324ac460330",
        "truncated_git_commits": [
            "63c175a - Robyn Ffrancon, 1523897681 : fix: per base coverage recorded when allele length greater than read length",
            "ef120fb - Robyn Ffrancon, 1523868010 : CMake ensures compiler supports C++17",
            "7dff14a - Robyn Ffrancon, 1523549634 : implemented personal reference infer command for haploid",
            "fc70f2a - Robyn Ffrancon, 1523283536 : fix: genotyper tests now pass",
            "50d9545 - Robyn Ffrancon, 1523282912 : linting"
        ]
    }
}
ffranr commented 6 years ago

Yesterday, we saw that gramtools ran correctly when using non-interactive LSF without requesting large RAM memory and with the singularity container.

Important factors to consider:

iqbal-lab commented 6 years ago

This is no longer reproducible on

gramtools --version { "version_number": "0.5.0", "last_git_commit_hash": "ff3e2d6f1b138c6839d8db7d83b51df97ef1e3a7", "truncated_git_commits": [ "ff3e2d6 - Robyn Ffrancon, 1525871127 : enhancement: if kmer size 8 or less, generate all possible kmers and build kmer index", "8db0f15 - Robyn Ffrancon, 1525860767 : enhancement: paths no longer memorised during building kmer index; cleanup", "c806726 - Robyn Ffrancon, 1524587729 : added docker config file", "63c175a - Robyn Ffrancon, 1523897681 : fix: per base coverage recorded when allele length greater than read length", "ef120fb - Robyn Ffrancon, 1523868010 : CMake ensures compiler supports C++17" ] }

bricoletc commented 5 years ago

This issue is coming back to haunt us.

On ebi-cli, when running the following command: bsub.py 20 "Build_Trio_Singularity" "singularity exec --home /nfs/research1/zi/bletcher/Pf_benchmark/Trio --bind /nfs/research1/zi/software/Python-3.6.4/lib/python3.6/site-packages:/mnt/snakemake /nfs/research1/zi/bletcher/Singularity_Images/8b46a86_gramtools.img bash -c 'set -euo pipefail; gramtools build --gram-dir /nfs/research1/zi/bletcher/Pf_benchmark/Trio/gramtools_runs/gram_k11 --reference /nfs/research1/zi/bletcher/Pf_benchmark/ref_data/Pfalciparum.genome.fasta --vcf /nfs/research1/zi/bletcher/Pf_benchmark/ref_data/pf3k_and_DBPMSPS1and2.vcf --kmer-size 11 --all-kmers'" We get the following output in stderr:

2019-04-11 13:06:14,986 gramtools INFO Start process: build 2019-04-11 13:07:03,880 gramtools INFO stdout: 2019-04-11 13:07:04,395 gramtools INFO stdout: 2019-04-11 13:07:04,395 gramtools INFO Process termination message: Illegal instruction 2019-04-11 13:07:04,396 gramtools INFO Process termination code: 132 2019-04-11 13:07:04,396 gramtools ERROR Error code != 0 2019-04-11 13:07:08,262 gramtools INFO End process: build

On the stdout side:

Finished printing linear PRG. Final number in alphabet is 3216844 maximum thread count: 1 Executing build command Generating integer encoded PRG

The call causing the break would seem to be here: https://github.com/iqbal-lab-org/gramtools/blob/master/libgramtools/src/build/build.cpp#L24 So python front end and some cpp works. Crash is at the first call to external library sdsl. Prior to that; command line parameter parsing using another external library (boost) works.

Importantly:

leoisl commented 5 years ago

hello,

got interested by this issue, since I've never seen this error before!

does gramtools always bug on this machine? I think you are submitting the job to a cluster, do you know the specific machine that ran it?

Could you try running grep avx2 /proc/cpuinfo on yoda and on the machine where you have the bug?

Probably this won't lead anywhere, but it is cool to solve such cryptic bugs...

leoisl commented 5 years ago

oops, sorry, the grep is just grep avx /proc/cpuinfo

My bet is that the code is compiled with SSE4.2 and AVX2 instructions, and it bugs because these instructions are not in the instruction set of the ebi-cli processor (hence the Illegal instruction error message).

I'd recommend building sdsl-lite as portable. Change https://github.com/iqbal-lab-org/gramtools/blob/master/libgramtools/lib/sdsl.cmake#L7 to:

        BUILD_COMMAND     bash -c "cd ${CMAKE_CURRENT_BINARY_DIR}/src/sdsl-lite-2.1.1 && BUILD_PORTABLE=1 ./install.sh ${CMAKE_CURRENT_BINARY_DIR}"

I also recommend removing -march=native and -msse4.2 from https://github.com/iqbal-lab-org/gramtools/blob/master/libgramtools/CMakeLists.txt#L15 .

Code will be slower, but maybe it will run.

Could you check if these changes make gramtools runnable?

Sorry that I can't test this for you right now, as I don't have access to the machines...

bricoletc commented 5 years ago

Thanks for the help.

grep avx /proc/cpuinfo | uniq

On yoda:

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts

On ebi-cli:

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid xsaveopt

I can see 'sse4_2' in both but 'avx2' only in yoda..

I will follow your recommendations and update when i've tested!

leoisl commented 5 years ago

Hello!

is ebi-cli also the worker node or only the submission node? I made a test yesterday and a code using avx2 ran in a machine that had only avx... So maybe this won't work for you... Did not really understand why, maybe I did sth wrong... well, looking forward to see if it solves your issue or we continue on the bug hunting!

bricoletc commented 5 years ago

Sorry that was the submission nodes.. Here is the information on worker nodes

Yoda node:

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts

Ebi-cli that crashed:

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt

leoisl commented 5 years ago

heh, no changes from the sse and avx perspectives... Let's hope it was avx2 causing the illegal instruction. It kinda makes sense, since the problem is on sdsl and it is compiled with avx2: https://github.com/simongog/sdsl-lite#installation . But only the real test will tell..

bricoletc commented 5 years ago

It seems the problem is a little more subtle? I made a python virtual env at the same commit producing the crash, and ran gramtools build on the same worker node which crashed- and it runs. Should I still make a container with BUILD_PORTABLE=1? It seems like sdsl is not causing crash, but perhaps singularity container build is.

bricoletc commented 5 years ago

It seems the problem is a little more subtle? I made a python virtual env at the same commit producing the crash, and ran gramtools build on the same worker node which crashed- and it runs. Should I still make a container with BUILD_PORTABLE=1? It seems like sdsl is not causing crash, but perhaps singularity container build is.

Just realised, that it probably works because gramtools gets compiled on ebi-cli nodes and thus sdsl is built without the problematic instructions. Building singu container with BUILD_PORTABLE=1 now.

leoisl commented 5 years ago

Ah, that makes sense!

I think you've seen this, but sdsl also mentions this:

These instructions are enabled by default if the processor of the build system supports them.

from https://github.com/simongog/sdsl-lite#installation

So I think what you just said is indeed what is happening...

Looking forward to see if BUILD_PORTABLE=1 indeed solved the issue!

bricoletc commented 5 years ago

Indeed!

But....I built an image from the following commit: https://github.com/bricoletc/gramtools/commit/6c72066a15ee8358adf970ce52d82b0293e02d2f

Confirming the libsdsl library is different (md5sum libsdsl.a):

This image still crashes with same error.

I guess there is something else in the venv/node build that the singularity/local build is not getting right.

Should I do as you said here?:

I also recommend removing -march=native and -msse4.2 from https://github.com/iqbal-lab-org/gramtools/blob/master/libgramtools/CMakeLists.txt#L15 .

leoisl commented 5 years ago

Oh, I think we should try removing both these. I think -msse4.2 is not really required, since it seems to be present in the worker, but let's remove anyway and try it out. Particularly, -march=native could really be a problem, if the hardware you compile on and the hardware you run is different: https://stackoverflow.com/questions/52653025/why-is-march-native-used-so-rarely

If this does not work, then can you be sure that the error in when you launch gramtools build (the c++ executable)? If yes, it would be worth to debug it with gdb as it would give us a stacktrace and we would know better where the illegal instruction comes from... But my bet now is on removing -march=native and -msse4.2 !

bricoletc commented 5 years ago

Good news! I removed -march=native and -msse4.2 (on top of BUILD_PORTABLE=1 passed to sdsl cmake): https://github.com/iqbal-lab-org/gramtools/commit/4ba1431fda352efb3e3b27473d3d8d76df63347f

and tested locally built Singularity img on several nodes, including the one that the crash was reported on. It now works!

The downside is run time: for command build on k=11: gramtools build --gram-dir /nfs/research1/zi/bletcher/Pf_benchmark/Trio/gramtools_runs/gram_k11 --reference /nfs/research1/zi/bletcher/Pf_benchmark/ref_data/Pfalciparum.genome.fasta --vcf /nfs/research1/zi/bletcher/Pf_benchmark/ref_data/pf3k_and_DBPMSPS1and2.vcf --kmer-size 11 --all-kmers

I suppose the slowdown was to be expected? Could it be due to differences between the cluster nodes other than linked to the flags we have changed?

I will leave this as a 'gramtools_portable' branch for now.

iqbal-lab commented 5 years ago

What if you try yoda with these changes?

leoisl commented 5 years ago

Nice!

Yeah, the slowdown is to be expected and could be a problem :(

But I don't think a binary can satisfy both worlds: be portable and at the same time very optimized to a specific hardware.

It seems that using only SSE2 could be an option: no significant slowdown and good portability (see https://twitter.com/lh3lh3/status/1075825180654604289)

I think -march=native is too strict and not portable at all, but as you've noted, it can have a significant speed-up (also confirmed here: https://twitter.com/sjackman/status/1075819811395985409 )

So, either we make a single choice based on benchmark and portability, or we do as sdsl and provide different ways to compile (I'd say one very optimized, but not portable with -march=native (the default one?), one less optimized and more portable, with only -msse2 (maybe for clusters?), and a last one not optimized but very portable (maybe for a quick test / users having compilation issues?)). For this last one, I think we should also address possible glibc problems, and the most portable way to distribute a linux executable I've found so far is in https://github.com/iqbal-lab-org/gramtools/issues/101#issuecomment-464043475 . The user would not even need to compile the c++ code, just download and run, but the downside will surely be performance. To decide on what options you would prefer, or maybe you have even other additional suggestions?

PS: this whole thread is really nice: https://twitter.com/sjackman/status/1075803280603664384

ekg commented 5 years ago

Just chiming in that I've had this same problem before with vg. Various parts may use different SIMD instructions depending on the host they were compiled on. Recently we have done OK with precompiled binaries, but I forget exactly what step worked. @amnovak may have more details about how things were improved.

On Mon, Apr 15, 2019, 20:34 leoisl notifications@github.com wrote:

Nice!

Yeah, the slowdown is to be expected and could be a problem :(

But I don't think a binary can satisfy both worlds: be portable and at the same time very optimized to a specific hardware.

It seems that using only SSE2 could be an option: no significant slowdown and good portability (see https://twitter.com/lh3lh3/status/1075825180654604289)

I think -march=native is too strict and not portable at all, but as you've noted, it can have a significant speed-up (also confirmed here: https://twitter.com/sjackman/status/1075819811395985409 )

So, either we make a single choice based on benchmark and portability, or we do as sdsl and provide different ways to compile (I'd say one very optimized, but not portable with -march=native (the default one?), one less optimized and more portable, with only -msse2 (maybe for clusters?), and a last one not optimized but very portable (maybe for a quick test / users having compilation issues?)). For this last one, I think we should also address possible glibc problems, and the most portable way to distribute a linux executable I've found so far is in #101 (comment) https://github.com/iqbal-lab-org/gramtools/issues/101#issuecomment-464043475 . The user would not even need to compile the c++ code, just download and run, but the downside will surely be performance. To decide on what options you would prefer, or maybe you have even other additional suggestions?

PS: this whole thread is really nice: https://twitter.com/sjackman/status/1075803280603664384

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/iqbal-lab-org/gramtools/issues/108#issuecomment-483367766, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI4ESdS21WouCUeVb-anEQPUQSWSnUYks5vhMZGgaJpZM4TZ__u .

bricoletc commented 5 years ago

@iqbal-lab

What if you try yoda with these changes?

Oh yes...Here are such results:

Note: the before changes on yoda is lower than reported above (6471 sec) because the after changes, includes a change to reduce memory consumption; I re-ran before changes with that change included. As an aside, this seems to reduce both memory usage and run time (I will log numbers somewhere else).

Strange that run on ebi-cli seems so much slower overall. I ran on bigmem node there, perhaps it is linked.

@leoisl seems slowdown maybe not so bad (on yoda). I'm sure it is a good idea to have different runtime/portability tradeoff builds and is linked to #101. As we currently don't distribute pre-compiled binaries we could do as you say, like sdsl, conditional build depending on host configuration. sdsl-lite does: https://github.com/simongog/sdsl-lite/blob/ddb0fbbc33bb183baa616f17eb48e261ac2a3672/CMakeLists.txt#L85 which is set in https://github.com/simongog/sdsl-lite/blob/master/CMakeModules/CheckSSE4_2.cmake ; only enable SSE 4 if host cpu has it.

@ekg thanks we'd be happy to know more!