tderrien / FEELnc

FEELnc : FlExible Extraction of LncRNA
GNU General Public License v3.0
82 stars 28 forks source link

segmentation fault #45

Closed jdmontenegro closed 3 years ago

jdmontenegro commented 3 years ago

Hello I just ran into this problem while trying to execute codpot:

$ FEELnc_codpot.pl -i merge_evidence.gtf -a busco.genes.gtf -g nv_dovetail_4_gapped_chroms.final.fasta -m shuffle --outdir feelnc_shuffle 
You do not have specified a maximum number mRNAs transcripts for the training. Use all the annotation, can be long...
You do not have specified a maximum number lncRNA transcripts for the training. Use all the annotation, can be long...
> Extract ORFs/cDNAs for mRNAs from a GTF file
Parsing file 'busco.genes.gtf'...
Parse input file:             [----------------------------------------------------------------------------------------------------]
    Your input GTF file 'busco.genes.gtf' contains *1400* transcripts
    Extracting ORFs/cDNAs 1400/1400...
    Extracted '1400' ORF/cDNAs sequences on '1400'.
> The lncRNA training file is not set. Get ORFs/cDNAs for lncRNAs by shuffling mRNA sequences
    Extracting ORFs/cDNAs 4200/4200...
    Extracted '4200' ORF/cDNAs sequences on '4200'.
> Extract ORFs/cDNAs for candidates RNAs from a GTF file
Parsing file 'merge_evidence.gtf'...
Parse input file:             [----------------------------------------------------------------------------------------------------]
    Your input GTF file 'merge_evidence.gtf' contains *35900* transcripts
    Extracting ORFs/cDNAs 35893/35900...
    Extracted '35893' ORF/cDNAs sequences on '35900'.
> Run random Forest on '/tmp//93265_merge_evidence.gtf.test_rna.fa'
    1. Compute the size of each sequence and ORF
    2. Compute the kmer ratio for each kmer and put the output file name in a list
    3. Compute the kmer score for each kmer size on learning and test ORF
sh: line 1: 93359 Segmentation fault      (core dumped) /home/jdmontenegroc/Documents/bin/KmerInShort -file /tmp//93265_merge_evidence.gtf.coding_orf.fa.forRandomForest.fa -kval /tmp//93265_merge_evidence.gtf.kmerScoreValues_size1.tmp -nb-cores 1 -kmer-size 1 -dont-reverse -step 1 >> /tmp//93265_merge_evidence.gtf.coding_sequencesKmer_1_ScoreValues.tmp 2> /dev/null
sh: line 1: 93371 Segmentation fault      (core dumped) /home/jdmontenegroc/Documents/bin/KmerInShort -file /tmp//93265_merge_evidence.gtf.noncoding_orf.fa.forRandomForest.fa -kval /tmp//93265_merge_evidence.gtf.kmerScoreValues_size1.tmp -nb-cores 1 -kmer-size 1 -dont-reverse -step 1 >> /tmp//93265_merge_evidence.gtf.noncoding_sequencesKmer_1_ScoreValues.tmp 2> /dev/null
sh: line 1: 93387 Segmentation fault      (core dumped) /home/jdmontenegroc/Documents/bin/KmerInShort -file /tmp//93265_merge_evidence.gtf.test_orf.fa -kval /tmp//93265_merge_evidence.gtf.kmerScoreValues_size1.tmp -nb-cores 1 -kmer-size 1 -dont-reverse -step 1 >> /tmp//93265_merge_evidence.gtf.test_sequencesKmer_1_ScoreValues.tmp 2> /dev/null
sh: line 1: 93400 Segmentation fault      (core dumped) /home/jdmontenegroc/Documents/bin/KmerInShort -file /tmp//93265_merge_evidence.gtf.coding_orf.fa.forRandomForest.fa -kval /tmp//93265_merge_evidence.gtf.kmerScoreValues_size2.tmp -nb-cores 1 -kmer-size 2 -dont-reverse -step 1 >> /tmp//93265_merge_evidence.gtf.coding_sequencesKmer_2_ScoreValues.tmp 2> /dev/null
sh: line 1: 93411 Segmentation fault      (core dumped) /home/jdmontenegroc/Documents/bin/KmerInShort -file /tmp//93265_merge_evidence.gtf.noncoding_orf.fa.forRandomForest.fa -kval /tmp//93265_merge_evidence.gtf.kmerScoreValues_size2.tmp -nb-cores 1 -kmer-size 2 -dont-reverse -step 1 >> /tmp//93265_merge_evidence.gtf.noncoding_sequencesKmer_2_ScoreValues.tmp 2> /dev/null
sh: line 1: 93423 Segmentation fault      (core dumped) /home/jdmontenegroc/Documents/bin/KmerInShort -file /tmp//93265_merge_evidence.gtf.test_orf.fa -kval /tmp//93265_merge_evidence.gtf.kmerScoreValues_size2.tmp -nb-cores 1 -kmer-size 2 -dont-reverse -step 1 >> /tmp//93265_merge_evidence.gtf.test_sequencesKmer_2_ScoreValues.tmp 2> /dev/null
sh: line 1: 93434 Segmentation fault      (core dumped) /home/jdmontenegroc/Documents/bin/KmerInShort -file /tmp//93265_merge_evidence.gtf.coding_orf.fa.forRandomForest.fa -kval /tmp//93265_merge_evidence.gtf.kmerScoreValues_size3.tmp -nb-cores 1 -kmer-size 3 -dont-reverse -step 3 >> /tmp//93265_merge_evidence.gtf.coding_sequencesKmer_3_ScoreValues.tmp 2> /dev/null
sh: line 1: 93446 Segmentation fault      (core dumped) /home/jdmontenegroc/Documents/bin/KmerInShort -file /tmp//93265_merge_evidence.gtf.noncoding_orf.fa.forRandomForest.fa -kval /tmp//93265_merge_evidence.gtf.kmerScoreValues_size3.tmp -nb-cores 1 -kmer-size 3 -dont-reverse -step 3 >> /tmp//93265_merge_evidence.gtf.noncoding_sequencesKmer_3_ScoreValues.tmp 2> /dev/null
sh: line 1: 93459 Segmentation fault      (core dumped) /home/jdmontenegroc/Documents/bin/KmerInShort -file /tmp//93265_merge_evidence.gtf.test_orf.fa -kval /tmp//93265_merge_evidence.gtf.kmerScoreValues_size3.tmp -nb-cores 1 -kmer-size 3 -dont-reverse -step 3 >> /tmp//93265_merge_evidence.gtf.test_sequencesKmer_3_ScoreValues.tmp 2> /dev/null
sh: line 1: 93471 Segmentation fault      (core dumped) /home/jdmontenegroc/Documents/bin/KmerInShort -file /tmp//93265_merge_evidence.gtf.coding_orf.fa.forRandomForest.fa -kval /tmp//93265_merge_evidence.gtf.kmerScoreValues_size6.tmp -nb-cores 1 -kmer-size 6 -dont-reverse -step 3 >> /tmp//93265_merge_evidence.gtf.coding_sequencesKmer_6_ScoreValues.tmp 2> /dev/null
sh: line 1: 93485 Segmentation fault      (core dumped) /home/jdmontenegroc/Documents/bin/KmerInShort -file /tmp//93265_merge_evidence.gtf.noncoding_orf.fa.forRandomForest.fa -kval /tmp//93265_merge_evidence.gtf.kmerScoreValues_size6.tmp -nb-cores 1 -kmer-size 6 -dont-reverse -step 3 >> /tmp//93265_merge_evidence.gtf.noncoding_sequencesKmer_6_ScoreValues.tmp 2> /dev/null
sh: line 1: 93495 Segmentation fault      (core dumped) /home/jdmontenegroc/Documents/bin/KmerInShort -file /tmp//93265_merge_evidence.gtf.test_orf.fa -kval /tmp//93265_merge_evidence.gtf.kmerScoreValues_size6.tmp -nb-cores 1 -kmer-size 6 -dont-reverse -step 3 >> /tmp//93265_merge_evidence.gtf.test_sequencesKmer_6_ScoreValues.tmp 2> /dev/null
sh: line 1: 93505 Segmentation fault      (core dumped) /home/jdmontenegroc/Documents/bin/KmerInShort -file /tmp//93265_merge_evidence.gtf.coding_orf.fa.forRandomForest.fa -kval /tmp//93265_merge_evidence.gtf.kmerScoreValues_size9.tmp -nb-cores 1 -kmer-size 9 -dont-reverse -step 3 >> /tmp//93265_merge_evidence.gtf.coding_sequencesKmer_9_ScoreValues.tmp 2> /dev/null
sh: line 1: 93516 Segmentation fault      (core dumped) /home/jdmontenegroc/Documents/bin/KmerInShort -file /tmp//93265_merge_evidence.gtf.noncoding_orf.fa.forRandomForest.fa -kval /tmp//93265_merge_evidence.gtf.kmerScoreValues_size9.tmp -nb-cores 1 -kmer-size 9 -dont-reverse -step 3 >> /tmp//93265_merge_evidence.gtf.noncoding_sequencesKmer_9_ScoreValues.tmp 2> /dev/null
sh: line 1: 93527 Segmentation fault      (core dumped) /home/jdmontenegroc/Documents/bin/KmerInShort -file /tmp//93265_merge_evidence.gtf.test_orf.fa -kval /tmp//93265_merge_evidence.gtf.kmerScoreValues_size9.tmp -nb-cores 1 -kmer-size 9 -dont-reverse -step 3 >> /tmp//93265_merge_evidence.gtf.test_sequencesKmer_9_ScoreValues.tmp 2> /dev/null
sh: line 1: 93541 Segmentation fault      (core dumped) /home/jdmontenegroc/Documents/bin/KmerInShort -file /tmp//93265_merge_evidence.gtf.coding_orf.fa.forRandomForest.fa -kval /tmp//93265_merge_evidence.gtf.kmerScoreValues_size12.tmp -nb-cores 1 -kmer-size 12 -dont-reverse -step 3 >> /tmp//93265_merge_evidence.gtf.coding_sequencesKmer_12_ScoreValues.tmp 2> /dev/null
sh: line 1: 93547 Segmentation fault      (core dumped) /home/jdmontenegroc/Documents/bin/KmerInShort -file /tmp//93265_merge_evidence.gtf.noncoding_orf.fa.forRandomForest.fa -kval /tmp//93265_merge_evidence.gtf.kmerScoreValues_size12.tmp -nb-cores 1 -kmer-size 12 -dont-reverse -step 3 >> /tmp//93265_merge_evidence.gtf.noncoding_sequencesKmer_12_ScoreValues.tmp 2> /dev/null

I am getting a ton of segfaults, but I am not sure why. I am runinng on a linux Manjaro OS :+1: NAME="Manjaro Linux"

ID=manjaro
ID_LIKE=arch
BUILD_ID=rolling
PRETTY_NAME="Manjaro Linux"
ANSI_COLOR="32;1;24;144;200"
HOME_URL="https://manjaro.org/"
DOCUMENTATION_URL="https://wiki.manjaro.org/"
SUPPORT_URL="https://manjaro.org/"
BUG_REPORT_URL="https://bugs.manjaro.org/"
LOGO=manjarolinux

with perl5.32 and R4.0.3.

Any help would be more than welcome.

Kind regards,

Juan D. Montenegro

tderrien commented 3 years ago

Hi Juan,

Thank your for the detailed report of the issue.

Indeed, the problem seems to be related to the KmerInShort executable needed by FEELnc for the fast extraction of k-mer profiles and which, a priori, does not work for your particular OS.

I'd suggest to try building KmerInShort directly from the dedicated git repo and then add it to your PATH as indicated in the Install section.

Best wishes,

Thomas

jdmontenegro commented 3 years ago

Thank you Thomas. The KmerInShort binary being used was compiled from the git repo, I didnt use any existing binaries.

vwucher commented 3 years ago

Hi Juan,

Thanks also for the details issue. Did you try to run just the KmerInShort tool? To know if it is because of FEELnc+KmerInShort or just KmerInShort.

Best wishes, Valentin

jdmontenegro commented 3 years ago

Thank you @vwucher

I just tested KmerInShort alone and it seems to be working correctly.

$ KmerInShort --help
ERROR: Unknown parameter '--help'
ERROR: Option '-file' is mandatory
ERROR: Option '-kmer-size' is mandatory

[kis options]
       -nb-cores     (1 arg) :    number of cores  [default '0']
       -verbose      (1 arg) :    verbosity level  [default '1']
       -version      (0 arg) :    version
       -help         (0 arg) :    help
       -file         (1 arg) :    input file 
       -kmer-size    (1 arg) :    ksize
       -out          (1 arg) :    output file  [default '']
       -offset       (1 arg) :    starting offset  [default '0']
       -step         (1 arg) :    step  [default '1']
       -kval         (1 arg) :    file with kmer values   [default '']
       -dont-reverse (0 arg) :    do not reverse kmers, count forward and reverse complement separately
       -freq         (0 arg) :    output frequency
       -perSeq       (0 arg) :    one output file and count per fasta sequence
       -NSE          (0 arg) :    compute normalized Shannon entropy
       -sum          (0 arg) :    compute sum over all files

so it does not seem to be an error here. Any specific command that I should try with KmerInShort?

jdmontenegro commented 3 years ago

Nope, wait I just ran a quick test and it segfaulted. Although not much information.

$ KmerInShort -file transcripts.busco.fa -kmer-size 21 > tmp
Counting in canonical mode (kmer and their reverse complement counted together) 
Launching kmer counting with offset 0 step 1 
[counting kmers                          ]  0    %   elapsed:   0 min 0  sec   remaining:   0 min 0  secSegmentation fault (core dumped)

I will recompile and test again.

thank you.

vwucher commented 3 years ago

Ok. Thanks for letting us update! In case it is not working even after re-compiling, you can also try using the binary we did. Maybe it will work fine.

Bye, Valentin

jdmontenegro commented 3 years ago

The problem is still there, the test file for KmerInShort works properly, but my dataset keeps segfaulting. Seems like a bug in KmerInShort. Nobody has yet reported a similar error.

vwucher commented 3 years ago

Ok. Did you try using the binary from the github folder? or only the one you compiled yourself? Also, one thing you can try is to split your data into 2, run KmerInShort on both, see for which one it is not working, then split this one into two, etc. To check if it is coming from a specific sequence.

jdmontenegro commented 3 years ago

Well my mistake, KmerInShort only uses kmer size up to 15 and I used 21, hence the segfault. When I run the same command with a k=15:

 KmerInShort -file transcripts.fa -kmer-size 15 -nb-cores 8 > tmp
Counting in canonical mode (kmer and their reverse complement counted together) 
Launching kmer counting with offset 0 step 1 
[counting kmers                          ]  100  %   elapsed:   0 min 2  sec   remaining:   0 min 0  sec

There is no segfault. So KmerInShort is working correctly on its own. Any more suggestions would be very helpful.

Nope, wait I just ran a quick test and it segfaulted. Although not much information.

$ KmerInShort -file transcripts.busco.fa -kmer-size 21 > tmp
Counting in canonical mode (kmer and their reverse complement counted together) 
Launching kmer counting with offset 0 step 1 
[counting kmers                          ]  0    %   elapsed:   0 min 0  sec   remaining:   0 min 0  secSegmentation fault (core dumped)

I will recompile and test again.

thank you.

vwucher commented 3 years ago

Ok. So when you use KmerInShort alone, it is working fine. But then when you use FEELnc_codpot.pl it crashes, right? Did you try with your new compilation? and with the native binary? Also, did you try to save the temporary file, using --keeptmp and then check the files? The ones with which KermInShort is crashing? To see if it can come from this. And trying the splitting of the input data can also help to find issues with the sequences if they exist.

jdmontenegro commented 3 years ago

Thank you for your suggestions, I switched to the KmerInShort version provided in the bin directory of FEELnc and the program ran correctly, so it seems to be something about the most recent version of KmerInShort or something broken during runtime in my local install.

Everything seems to be working properly now.

Best regards,

Juan D. Montenegro

vwucher commented 3 years ago

Ok great.

I will close the issue then.

Regards, Valentin