qmarcou / IGoR

IGoR is a C++ software designed to infer V(D)J recombination related processes from sequencing data. Find full documentation at:
https://qmarcou.github.io/IGoR/
GNU General Public License v3.0
48 stars 24 forks source link

Segmentation fault during inference #35

Open Thopic opened 5 years ago

Thopic commented 5 years ago

Hi Quentin,

I have a recurrent "Segmentation fault", crashing IGoR, with no error message during inference. It happens both on my laptop and on the cluster, usually at the start of an iteration (rarely the first one). The iteration at which the crash happens seems to be random, even with the same sequences, sometimes it can even end without a crash (especially if I look at small number of sequences, say < 10000) . The alignment part always goes smoothly. I tested that it is not tied to a particular sequence. Additionally, when there is no crash final_marginals.txt is produced and looks reasonable.

I'm providing IGoR my own genomics files and model_parms.txt and running it with the following commands:

cmd="igor -set_wd $wdpath -batch foo"
$cmd -read_seqs $sequences
cmd="$cmd -set_genomic --V $mygenomicVs --J $mygenomicsJs --D $mygenomicsDs -set_CDR3_anchors --V $myanchorsV --J $myanchorsJ"  
$cmd -align --V
$cmd -align --D --thresh 30
$cmd -align --J
$cmd -infer -set_custom_model $mymodel_params

I'm using the current version of IGoR, compiled with

./configure --prefix=$HOME/.local/ && make && make install

I've tried to run it through valgrind to pinpoint where the error happens, so I've compiled IGoR with debug flags:

./configure --prefix=$HOME/.local/ CPPFLAGS=-DDEBUG CXXFLAGS="-g -O0" && make && make install

And ran the same command as before (without the alignment part)

cmd="valgrind --leak-check=yes $cmd"
$cmd -infer -set_custom_model $mymodel_params

At the start of the inference process (just after "Initialization of probability bounds over."), I start to get multiple valgrind errors, with the first one being:

==24496== Invalid read of size 4
==24496==    at 0x169635: Dinucl_markov::update_event_internal_probas(std::unique_ptr<long double [], std::default_delete<long double []> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > const&) (Dinuclmarkov.cpp:479)
==24496==    by 0x186731: GenModel::infer_model(std::vector<std::tuple<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<Gene_class, std::vector<Alignment_data, std::allocator<Alignment_data> >, std::hash<Gene_class>, std::equal_to<Gene_class>, std::allocator<std::pair<Gene_class const, std::vector<Alignment_data, std::allocator<Alignment_data> > > > > >, std::allocator<std::tuple<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<Gene_class, std::vector<Alignment_data, std::allocator<Alignment_data> >, std::hash<Gene_class>, std::equal_to<Gene_class>, std::allocator<std::pair<Gene_class const, std::vector<Alignment_data, std::allocator<Alignment_data> > > > > > > > const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, double, bool, double, double) [clone ._omp_fn.0] (GenModel.cpp:315)
==24496==    by 0x599C97D: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==24496==    by 0x4E436DA: start_thread (pthread_create.c:463)
==24496==    by 0x5EEE88E: clone (clone.S:95)
==24496==  Address 0x13d50240 is 0 bytes after a block of size 209,760 alloc'd
==24496==    at 0x4C3089F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24496==    by 0x1D93B9: Model_marginals::Model_marginals(Model_marginals const&) (Model_marginals.cpp:48)
==24496==    by 0x18543E: GenModel::infer_model(std::vector<std::tuple<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<Gene_class, std::vector<Alignment_data, std::allocator<Alignment_data> >, std::hash<Gene_class>, std::equal_to<Gene_class>, std::allocator<std::pair<Gene_class const, std::vector<Alignment_data, std::allocator<Alignment_data> > > > > >, std::allocator<std::tuple<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<Gene_class, std::vector<Alignment_data, std::allocator<Alignment_data> >, std::hash<Gene_class>, std::equal_to<Gene_class>, std::allocator<std::pair<Gene_class const, std::vector<Alignment_data, std::allocator<Alignment_data> > > > > > > > const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, double, bool, double, double) [clone ._omp_fn.0] (GenModel.cpp:193)
==24496==    by 0x599C97D: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==24496==    by 0x4E436DA: start_thread (pthread_create.c:463)
==24496==    by 0x5EEE88E: clone (clone.S:95)
==24496== 

That's it I think, if you need my genes/CDR3/model_parms files, or the sequences, just ask.

Additional question, is this the "right way" to provide IGoR with a non-supplied specie ?

Thanks,

Thomas

qmarcou commented 5 years ago

Hi Thomas, Thanks for taking the time to produce such a detailed bug report! I highly appreciate it This indeed seems to be a real segfault issue, valgrind points to the following:

this->dinuc_proba_matrix(i,j) += marginal_array[event_index + i*event_realizations.size() + j];
//TODO This is risky in case the code evolves to have more than 4 realizations

At first I thought I had made a mistake upon making changes to handle ambiguous nucleotides within IGoR but it seems commit 4a1ac6264fbb411710a7d42f49cc2e22e8b286e6 did not change anything there. I had checked all the code using valgrind at some point but maybe I missed something. Also it seems upon writing this line of code I was already suspicious of its potential pitfalls.

This is indeed the right way to supply a model for an undefined species, however did you make changes to the model_marginals or model_parms by hand or did you use the command line and or cpp interface?

I'll try and find what is wrong just from the error message but I might ask you all the models and sequences if I cannot find an obvious cause.

Best, Quentin

Thopic commented 5 years ago

Thanks for the fast answer,

I'm not providing the model_marginals (if I'm reading the documentation correctly, this should initialize it to the uniform distribution), but I write my model_params.txt manually, which could cause the issue.

Here is a summary of the file:

@Event_list
#GeneChoice;V_gene;Undefined_side;7;v_choice
%Omyk_Chr12_NC_035088.1_IGHV20_81397765_81398073;GTTTGCTGTGATGAGATGGATCAGTCTCCTTCTCAGGTGAAAAGACCTGGAGACAAGTTTAAATTGTCATGTAAAATCTCTGGGTTTGATATGACAGACTACTATATGACCTGGATAAGACAGAAACCAGGCAAAGCTCTGGAGTGGATTGGGGGTATTGACTCTGGTTCAACAAATTCTCCAGACTACTCAGACTCCCTGAAAGGCCAGTTCAGCCTGACTGAGGACGTCTCTACAAGCACACAGTTCTTAGAGGCCAAGAGTCTGAGTTCAGAGGACTCTGCTGTTTATTATTGTGCTCGACAGG;0
%Omyk_Chr12_NC_035088.1_IGHV37_81466761_81467069;GTTTGCTGTGATGAGATGGATCAGTCACCTTCTCAGGTGAAAAGACCTGGAGACACGTTTAAATTGTCATGTAAAATCTCTGGGTTTGATATGACAGACTATTTCATGCACTGGATAAGACAGAAACCAGGCAAAGCTCTGGAGTGGATTGGGAGAATTAACTCTGGTTCAAGAAATGCTCCAGTCTACTCAGACTCCCTGAAAGGCCAGTTCACCCTTACTGAGGACATCTCTACAAGCACACAGTTCTTAGAGGCCAAGAGTCTGAGTTCAGAGGACTCTGCTGTTTATTATTGTGCTCGAGAGA;1
%Omyk_Chr12_NC_035088.1_IGHV57_81618381_81618689;GTTTGCTGTGATGAGATGGATCAGTCACCTTCTCAGGTGAAAAGACCTGGAGACACGTTTAAATTGTCATGTAAAATCTCTGGGTTTGATATGACAGACTATTTCATGCACTGGATAAGACAGAAACCAGGCAAAGCTCTGGAGTGGATTGGGGGTATTTACTCTGGTTCAACAGATGCTCCAGTCTACTCAGACTCCCTGAAAGGCCAGTTCATCCTTACTGAGGACATCTCTACAAGCACACAGTTCTTAGAGGCCAAGAGTCTGAGTTCAGAGGACTCTGCTGTTTATTATTGTGCTCGAGAGA;2
#GeneChoice;D_gene;Undefined_side;6;d_gene
%Omyk_Chr12_NC_035088.1_RSS_IGHD1_RSS_81394296_81394994;ACTATATGGGGGC;0
...
%Omyk_Chr13_NC_035089.1_RSS_IGHD11_RSS_48342957_48343024;TATGGGGGCAGC;23
#GeneChoice;J_gene;Undefined_side;7;j_choice
%Omyk_Chr12_NC_035088.1_RSS_JH1_SS_81678720_81678808;ATGCTTTTGACTACTGGGGTAAAGGGACACAAGTCACCGTCTCAACAG;0
...
%Omyk_Chr13_NC_035089.1_RSS_IGHJ10_SS_48368507_48368599;ACAATGCTTTTGACTACTGGGGGAAAGGCACAATGGTTACCGTTTCATCAG;18
#Deletion;V_gene;Three_prime;5;v_3_del
%-4;0
...
%16;20
#Deletion;D_gene;Three_prime;5;d_3_del
%-4;0
...
%16;20
#Deletion;D_gene;Five_prime;5;d_5_del
%-4;0
...
%16;20
#Deletion;J_gene;Five_prime;5;j_5_del
%-4;0
...
%18;22
#Insertion;VD_genes;Undefined_side;4;vd_ins
%0;0
...
%30;30
#Insertion;DJ_gene;Undefined_side;2;dj_ins
%0;0
...
%30;30
#DinucMarkov;VD_genes;Undefined_side;3;vd_dinucl
%T;3
%C;1
%G;2
%A;0
#DinucMarkov;DJ_gene;Undefined_side;1;dj_dinucl
%T;3
%C;1
%G;2
%A;0
@Edges
%GeneChoice_V_gene_Undefined_side_prio7_size3;Deletion_V_gene_Three_prime_prio5_size21
%GeneChoice_V_gene_Undefined_side_prio7_size3;GeneChoice_D_gene_Undefined_side_prio6_size24
%GeneChoice_V_gene_Undefined_side_prio7_size3;GeneChoice_J_gene_Undefined_side_prio7_size19
%GeneChoice_D_gene_Undefined_side_prio6_size24;Deletion_D_gene_Three_prime_prio5_size21
%GeneChoice_D_gene_Undefined_side_prio6_size24;Deletion_D_gene_Five_prime_prio5_size21
%GeneChoice_J_gene_Undefined_side_prio7_size19;Deletion_J_gene_Five_prime_prio5_size23
%GeneChoice_J_gene_Undefined_side_prio7_size19;GeneChoice_D_gene_Undefined_side_prio6_size24
%Deletion_D_gene_Five_prime_prio5_size21;Deletion_D_gene_Three_prime_prio5_size21
@ErrorRate
#SingleErrorRate
0.005

(Only three V-genes, I'm focusing on one primer)

Best, Thomas

Thopic commented 5 years ago

Hi,

I think I solved the first Valgrind error, you were right it was a "typo" leftover from your move to ambiguous nucleotides. I sent you a pull request (which is completely unnecessary for this one-line fix, but I never sent one, and I wanted to try it out)

It didn't solve my issue though, and a few other memory errors are popping out, I'm gonna try to look into these in the next few days (weeks more probably). I'm joining my valgrind error log, if you think your workload is too light,

Best,

Thomas

valgrind_log.txt

qmarcou commented 5 years ago

Hi Thomas, Sorry about this I should have spotted it looking more than 2 seconds at the code! Thanks for the PR i'll accept it now! Keep me updated of your Valgrind quest, I'm afraid because Valgrind have very long runtimes you might not get past the model initialization (maybe I'm wrong, but it will clearly take you some time). Looking at TRA like models could make the computation lighter (although it might hide the issues you are looking for) Quentin

Thopic commented 5 years ago

Hi,

I think I figured out how to avoid the segmentation fault, in my model the maximum number of allowed deletions on the D gene was too low , and when I increased it, IGoR ran fine.

So if someone happens to get the same error, increasing the range of parameters of the model seems to work.

While I'm here and because this probably doesn't need a new issue, I have a small question, do you know what happens in IGoR when delD3 + delD5 > len(D) ? (or even delD3 > len(D))

Best,

Thomas

qmarcou commented 4 years ago

Hi Thomas, Not sure my answer is still relevant but I find this weird that increasing D deletion range would change anything in this segfault thing... I or @alfaceor still need to get a look at this!

As for your last question: the code design assuming there is no deletion simply prevents this from happening. You can only delete up to current D length (meaning if you have added p nucleotides on 1 side, it is still allowed to delete them upon deleting on the other side).

Quentin