isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
271 stars 49 forks source link

error: overlap is not transmuted! #77

Open jdmontenegro opened 6 years ago

jdmontenegro commented 6 years ago

Hi, I am trying to polish a PacBio assembly with illumina reads. After one round of polishing using the pacbio reads, I mapped the illumina reads to the the polished assembly with minimap2 and used the sam output as overlap information for polishing using the following commands:

minimap2 -t 64 -ax sr consensus1.fasta ${reads} > illumina.sam
racon -t 64 ${reads} illumina.sam consensus1.fasta > consensus2.fasta

The mapping works quite well, but after a few hours running I hit the following error:

[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
[racon::Polisher::initialize] loaded overlaps
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!

I am not sure what does it mean or how to fix this. Any suggestions are more than welcome! Kind regards,

rvaser commented 6 years ago

Hello, which version of racon are you using?

Best regards, Robert

jdmontenegro commented 6 years ago

I'm using the latest 1.3.1 installed with conda.

rvaser commented 6 years ago

It is really strange that this particular error occurs. Would you mind sharing your data so I can investigate further?

Best regards, Robert

jdmontenegro commented 6 years ago

It is quite large, over 500gb of data.

rvaser commented 6 years ago

How large is the created sam file and how much RAM does your machine have?

jdmontenegro commented 6 years ago

The Sam file is nearly 500gb and I m using nearly 800gb of ram. I can use up to 1.5tb of ram

rvaser commented 6 years ago

Could you please add the following lines of code at https://github.com/isovic/racon/blob/master/src/polisher.cpp#L330:

for (uint64_t i = 0; i < overlaps.size(); ++i) {
    if (overlaps[i] == nullptr) {
         fprintf(stderr, "Null at %lu/%zu\n", i, overlaps.size());
         continue;
    }
    if (!overlaps[i]->is_transmuted()) {
        fprintf(stderr, "Not transmuted at %lu/%zu\n", i, overlaps.size());
    }
}

and the following lines at https://github.com/isovic/racon/blob/master/src/overlap.hpp#L63:

    bool is_transmuted() const {
        return is_transmuted_;
    }

Run make and try running the same racon command as above. Please paste the output here.

jdmontenegro commented 6 years ago

Thank you Robert,

Unfortunately, I installed it using conda, because building from source kept producing a failed binary that was unable to run. I can send the error I got shortly.

Cheers,

rvaser commented 6 years ago

Sure, send me the error you are getting when building from source.

jdmontenegro commented 6 years ago

Hi Robert,

The error I got after compiling racon under linux 3.10.0-693.2.2e17.x86-64 with gcc 5.2.0 and cmake 3.9.0 is the same as in this thread:

https://github.com/isovic/racon/issues/73

but the compilation and the execution were done in the same node using the same environment, so the solution in that thread did not work for us.

Cheers,

Juan Montenegro

jdmontenegro commented 6 years ago

Hi Robert, I recompiled the source code with your suggested changes and the program is running, but the STDERR is very large. Look at the first 25 lines of the log:

racon -t 64 ${DIR}/merged.shuffled.fastq ${DIR}/shrimp.illumina.sam ${DIR}/shrimp_consensus.fasta > ${DIR}/shrimp_consensus2.fasta
[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
[racon::Polisher::initialize] loaded overlaps
Not transmuted at 0/721313620
Not transmuted at 1/721313620
Not transmuted at 2/721313620
Not transmuted at 3/721313620
Not transmuted at 4/721313620
Not transmuted at 5/721313620
Not transmuted at 6/721313620
Not transmuted at 7/721313620
Not transmuted at 8/721313620
Not transmuted at 9/721313620
Not transmuted at 10/721313620
Not transmuted at 11/721313620
Not transmuted at 12/721313620
Not transmuted at 13/721313620
Not transmuted at 14/721313620
Not transmuted at 15/721313620
Not transmuted at 16/721313620

and it goes on and on until the end:

Not transmuted at 721313615/721313620
Not transmuted at 721313616/721313620
Not transmuted at 721313617/721313620
Not transmuted at 721313618/721313620
Not transmuted at 721313619/721313620

I am not entirely sure how to interpret this. What does "Not transmuted" mean for racon? Does this mean it cannot calculate the consensus for some reason? Were the reads incorrectly mapped or are there way too many indels/mutations to estimate a consensus?

The program worked quite well during the first polishing stage using long reads. The only change has been the use of short reads to the consensus_1. Any ideas are more than welcome.

Kind regards,

rvaser commented 6 years ago

The error means that the read identifiers weren't replaced with read IDs in overlaps, see here https://github.com/isovic/racon/blob/master/src/overlap.cpp#L129-L177. I don't know how this output is even possible, if a overlap isn't transmuted it is deleted and your output indicates that not a single one is transmuted (and none of them are deleted). You are running racon on a single node right?

rvaser commented 6 years ago

Can you please paste here a couple of non-header lines of your SAM file, and a few sequence headers from Illumina reads and polished contigs?

jdmontenegro commented 6 years ago

Hi Robert,

thanks for your quick reply. I am using 64 threads, see command below:

racon -t 64 ${reads} illumina.sam consensus.fasta > consensus2.fasta

please see below the tail of my sam file:

AGRF-33:351:HMHKNBCXX:2:2215:21275:100712       161     contig_72160    13809   1       22M1I12M1I12M1I16M1I13M4D27M5D35M6D34M3D27M12S  contig_20689    2573    0       TATATATATAATATATATATATAATATATATATATAATATATATATATAATATATATATATATATAATATATATATATATATAATATATATATATACATATATATAATATATATATATATATATATATATATATATATATAAATATATTTATATAATATATATATATATATATATTAAATATATAATATATATATATATATAATATATATATAA      DDDCDIIIIIIIIIIIIIIIHIIHHIEHHE1DCGHHHHIIIIIIIIIIHIIIIIIHHIIHHIIIIFEGHHHIEHEHI?FHHH@FHHH?FIIGIHGF1DFHIFHIC1G1HHHI?HHIIIIIIEHEEHHHFFF1<<F1<1<CD1<<1<D<1<DC11111<1DFG1CFFEHF@1<1111111<CC1<<1<<<<C@CG0<<<@C0<00<0<0<09<00      NM:i:27 ms:i:206        AS:i:206    nn:i:0  tp:A:P  cm:i:1  s1:i:43 s2:i:43
AGRF-33:351:HMHKNBCXX:2:2215:21345:100730       89      contig_15432    3165    1       23S30M2I12M10D42M3I31M57S       =       3165    -125    TNAAGAGATAAATAGAGAGAGAGAGAGAGAGAGAGAGATAGAGAGAGAGATAGATAGAGAGATAGATAGAGAGATAGAGAGATAGAAAGAGAGAGAGAGAGAGAGAGAGTTAGAGAGAGAGAGAGAAAGAGAGAGAGAGAGAGTAAGAAAGAGAAAGAAAGAGATAGAGATAGAGAAAAATACAGAGAGAGAGAGAAANA    1#1111<<1<<<1<1EEFC<1E@C1<1F<<1<1<1DC<1FD11IHHHG@C1HHH?F1<1<1<1CCD1FCD1GCD1F<D1F1<1<1<1HGFHFHHHGGCCHHFCD1@1D<<1FHHHEEFEIHF@GFHEHHIIIHE@HHHEHEHEHHHEIHHIHIHHIIIHIHF<HEHIIHHEIIIHHFHHF@HHHCIIIIHEGHFHD<<#D    NM:i:19 ms:i:124        AS:i:124        nn:i:0  tp:A:P  cm:i:2  s1:i:37 s2:i:43
AGRF-33:351:HMHKNBCXX:2:2215:21345:100730       165     contig_15432    3165    0       *       =       3165    125     AGAGATNGANAAAGNGAGAAAGAGAGAGNNAGAGAGAGNGAGAGAAAGAGNGAGAGNGANAGNTNGNGANAGAGNGAGATNNNNAGATAANTAGAGAGANAGAGANAGAGNGAGAGAGAGATAGAGAGNGAGANAGAGAGNGAGNNNGA       <D00<<#<1#<<<D#<<<<CHH1DCD<G##1<<D?CCG#<<DDE11DGEH#<<D<E#<<#1<#<#<#<1#<<1D#1<<D1####<<11<<#1<<<D<E<#1<<DD#<<1<#<<11DE01EC1<FHFC@#0<<<#<000<E#//<###</
AGRF-33:351:HMHKNBCXX:2:2215:21255:100736       73      contig_72999    2205    1       23M1I4M23I39M104S       =       2205    0       GAGAGAGAGAGAGAGAGAGAGAGAGAGAGGGCGGAGAGGGGGAGAGACAATGAGAGAGAGAGAATGAGAGAGAGAGAGAGAGAGAGAGAGCGTAGGAGAGAGAGTGCGCAAGAAAGAGAGAGAAAAAGACAGCGAGAGAAAGAAGGAGAGGGAGAAAGAGAGGGAGAGGGCGAGGAGGGCGCGGCATGCTGATA  DBD@@<0<ECEHHIH?G?=<0<<E=<C<0000<////<<<<//<0<<<111<11<C1<1D10<111<11<<C10<01D1<<D@H@@10<01/<0/011101<1<<11/</</1111111010=111<0<1=C=/<<//01100010000000<0000000=0/000///:/://///.::--//---/;://8/  NM:i:24 ms:i:78 AS:i:78 nn:i:0  tp:A:P  cm:i:3  s1:i:29 s2:i:0
AGRF-33:351:HMHKNBCXX:2:2215:21255:100736       133     contig_72999    2205    0       *       =       2205    0       CTCTCTCTTTCTCTCTCTCTCTCTCTTTTTATCTCTATCTCTNTCTTTCTCCCTCTTTGTCTTCTTCTTCGTCTNTTTCTTTCTCTTTTTTTCTCTCTTTCTCTCTTCTTCTCGTTTTCTGTCTTTCTCTTATTCTTATTCATCTCATTACTTTCATCATTCTTATTATCATAATCCTCCCCTT    <<@0<11<111<<<D1<1<<11111<11111<111<1<<11<#<<11<<<1<1111111<<C111<<<1<110<#1111<111<<<1<1</<1<11<11<D1<1<<1<1<1<10<<1<1C1<11<1<1<<11<<111<1<1111<11<<1<1<1<1<111<1<<11<C<<<<<<<<1<<11<00

I do not see anything unusual with the sam file. It was produced using minimap2 like this:

minimap2 -t 64 -ax sr consensus.fasta ${reads} > illumina.sam

rvaser commented 6 years ago

Are there maybe sequences with identical names (paired ends might have equal names up to the first white space)?

jdmontenegro commented 6 years ago

That's right, was I suppose to change the name of paired end reads. I will modify the names and resubmit

rvaser commented 6 years ago

There should be an error regarding mismatched read lengths if there are two or more sequences with the same name (meaning each sequence name in a file should be unique up to the first white space). Maybe the error you are getting is due to multithreading and the exit function. Try fixing the Illumina read names and lets see if it resolves.

jdmontenegro commented 6 years ago

Hi Robert, I just remembered that I changed the names of the illumina reads and added a /1 and /2 before shuffling the reads. But the /1 and /2 were removed from the read names after mapping. I'll fix the same file manually and try again. I'll keep you updated.

Cheers,

jdmontenegro commented 6 years ago

So just a few updates: 1) Minimap2.1 removes the trailing /1 and /2 from read names, but still requires them to map them as paired end 2) Minimap2.1 still reports secondary alignments even if -N 0 is set. 3) Due to these problems I have had to modify the sam file manually to: a) remove secondary alignments b) add a _1 or _2 to the read name to give them different names. Below is a short perl script to do this on the sam file:

use strict;
use warnings;
use Data::Dumper;

my $in = shift or die "Please give the sam file as argument\n";

open (SAM, "<", $in);

while (my $r = <SAM>){
  if ($r =~ /^@/){
    print $r;
    next;
  };
  chomp $r;
  my @r = split(/\t/, $r);
  my $bin = dec2bin($r[1]);
  if (defined($bin->[-12])){print STDERR Dumper(\@r); next};
  my $pair = "_1";
  if (defined($bin->[-8])){$pair = "_2"};
  $r =~ s/\t/$pair\t/;
  print $r . "\n";
};
close SAM;

sub dec2bin {
    my ($flags) = @_;
    my $str = unpack("B32", pack("N", $flags));
    $str =~ s/^0+(?=\d)//;   # otherwise you'll get leading zeros
    my $bin = [split (//, $str)];
    return $bin;
}

Hope this helps anyone. I am currently modifying the sam file and then will run racon again to see if the previous error was fixed.

rvaser commented 6 years ago

I'll see if I can add a easier way to process paired ends.

jdmontenegro commented 6 years ago

Hi Robert, After fixing the sam file it looks like this:

AGRF-33:351:HMHKNBCXX:2:2215:21275:100712_1     81      contig_27617    23272   1       9M1D61M1D14M1D8M1D14M1D13M1D11M1D10M13D12M1I15M1D22M1D25M1D13M1D9M1D13M        contig_100327   16626   0       AATATATATATATATATATAATATATATATATAAAATATATATATATAAAACATATATAAAATATATATAATATATATATAAATTATATATAATATATATATATATTATATATATATATTATATATATATTATATACATAAAATATATATATAATATATATATATATAATATATATAAATATATATATATATATATATATATATATATATATATAATATATATATATAATATATATAATATATATATATA    <00G?HFHCC<0HF?HGHGD0@IHIHGHFGGD<01FEHED1D1HGG<<111<C<F<<1D<1<1HCHFC<C<1HGIHIIHF<<11<1HHGGC<11HIHIIHHGD<<1C?HHH@HIHHED<1@GHF@HIHGC1CGHF<1D1D11IIIHHGDHHFD1?IIIIIHIHHIHGD1@IIIIIHD1G@IIIIIHIIIHIIIHIHIIIIIIIIHIIHIIIIHHIHIIIIHIIIIIHIIIIIIHHIIIHIIHHHHDDDBD     NM:i:33ms:i:216        AS:i:217        nn:i:0  tp:A:P  cm:i:3  s1:i:51 s2:i:45
AGRF-33:351:HMHKNBCXX:2:2215:21275:100712_2     161     contig_100327   16626   2       79S27M108S      contig_27617    23272   0       TATATATATAATATATATATATAATATATATATATAATATATATATATAATATATATATATATATAATATATATATATATATAATATATATATATACATATATATAATATATATATATATATATATATATATATATATATAAATATATTTATATAATATATATATATATATATATTAAATATATAATATATATATATATATAATATATATATAA        DDDCDIIIIIIIIIIIIIIIHIIHHIEHHE1DCGHHHHIIIIIIIIIIHIIIIIIHHIIHHIIIIFEGHHHIEHEHI?FHHH@FHHH?FIIGIHGF1DFHIFHIC1G1HHHI?HHIIIIIIEHEEHHHFFF1<<F1<1<CD1<<1<D<1<DC11111<1DFG1CFFEHF@1<1111111<CC1<<1<<<<C@CG0<<<@C0<00<0<0<09<00NM:i:0  ms:i:54 AS:i:54 nn:i:0  tp:A:P  cm:i:1  s1:i:42 s2:i:0
AGRF-33:351:HMHKNBCXX:2:2215:21345:100730_1     89      contig_22953    9135    10      33S47M120S      =       9135    -47     TNAAGAGATAAATAGAGAGAGAGAGAGAGAGAGAGAGATAGAGAGAGAGATAGATAGAGAGATAGATAGAGAGATAGAGAGATAGAAAGAGAGAGAGAGAGAGAGAGAGTTAGAGAGAGAGAGAGAAAGAGAGAGAGAGAGAGTAAGAAAGAGAAAGAAAGAGATAGAGATAGAGAAAAATACAGAGAGAGAGAGAAANA      1#1111<<1<<<1<1EEFC<1E@C1<1F<<1<1<1DC<1FD11IHHHG@C1HHH?F1<1<1<1CCD1FCD1GCD1F<D1F1<1<1<1HGFHFHHHGGCCHHFCD1@1D<<1FHHHEEFEIHF@GFHEHHIIIHE@HHHEHEHEHHHEIHHIHIHHIIIHIHF<HEHIIHHEIIIHHFHHF@HHHCIIIIHEGHFHD<<#D       NM:i:0  ms:i:94 AS:i:94 nn:i:0tp:A:P   cm:i:5  s1:i:43 s2:i:42 SA:Z:contig_21862,13169,+,58S106M1I35S,10,16;
AGRF-33:351:HMHKNBCXX:2:2215:21345:100730_2     165     contig_22953    9135    0       *       =       9135    47      AGAGATNGANAAAGNGAGAAAGAGAGAGNNAGAGAGAGNGAGAGAAAGAGNGAGAGNGANAGNTNGNGANAGAGNGAGATNNNNAGATAANTAGAGAGANAGAGANAGAGNGAGAGAGAGATAGAGAGNGAGANAGAGAGNGAGNNNGA  <D00<<#<1#<<<D#<<<<CHH1DCD<G##1<<D?CCG#<<DDE11DGEH#<<D<E#<<#1<#<#<#<1#<<1D#1<<D1####<<11<<#1<<<D<E<#1<<DD#<<1<#<<11DE01EC1<FHFC@#0<<<#<000<E#//<###</
AGRF-33:351:HMHKNBCXX:2:2215:21255:100736_1     73      contig_72999    2205    1       23M1I4M23I39M104S       =       2205    0       GAGAGAGAGAGAGAGAGAGAGAGAGAGAGGGCGGAGAGGGGGAGAGACAATGAGAGAGAGAGAATGAGAGAGAGAGAGAGAGAGAGAGAGCGTAGGAGAGAGAGTGCGCAAGAAAGAGAGAGAAAAAGACAGCGAGAGAAAGAAGGAGAGGGAGAAAGAGAGGGAGAGGGCGAGGAGGGCGCGGCATGCTGATA    DBD@@<0<ECEHHIH?G?=<0<<E=<C<0000<////<<<<//<0<<<111<11<C1<1D10<111<11<<C10<01D1<<D@H@@10<01/<0/011101<1<<11/</</1111111010=111<0<1=C=/<<//01100010000000<0000000=0/000///:/://///.::--//---/;://8/     NM:i:24 ms:i:78 AS:i:78 nn:i:0  tp:A:Pcm:i:3   s1:i:29 s2:i:0
AGRF-33:351:HMHKNBCXX:2:2215:21255:100736_2     133     contig_72999    2205    0       *       =       2205    0       CTCTCTCTTTCTCTCTCTCTCTCTCTTTTTATCTCTATCTCTNTCTTTCTCCCTCTTTGTCTTCTTCTTCGTCTNTTTCTTTCTCTTTTTTTCTCTCTTTCTCTCTTCTTCTCGTTTTCTGTCTTTCTCTTATTCTTATTCATCTCATTACTTTCATCATTCTTATTATCATAATCCTCCCCTT      <<@0<11<111<<<D1<1<<11111<11111<111<1<<11<#<<11<<<1<1111111<<C111<<<1<110<#1111<111<<<1<1</<1<11<11<D1<1<<1<1<1<10<<1<1C1<11<1<1<<11<<111<1<1111<11<<1<1<1<1<111<1<<11<C<<<<<<<<1<<11<00

As you can see the read names for pairs are different (_1 and _2) and only the main alignment is kept (no secondary alignments). Unfortunately, the error persists:

racon -t 64 merged.shuffled.fastq illumina.corrected.sam consensus.fasta > consensus2.fasta
[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
[racon::Polisher::initialize] loaded overlaps
Not transmuted at 0/627471727
Not transmuted at 1/627471727
(...)
Not transmuted at 627471722/627471727
Not transmuted at 627471723/627471727
Not transmuted at 627471724/627471727
Not transmuted at 627471725/627471727
Not transmuted at 627471726/627471727
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!

and no consensus is written. Should I retry using only one thread instead of 64 as I am currently doing? Will this make the execution much slower? Any suggestion is more than welcome. Regards, Juan Montenegro

rvaser commented 6 years ago

Hi Juan, do the reads in the FASTQ file have the updated names as in the SAM file?

Best regards, Robert

jdmontenegro commented 6 years ago

Hi Robert, Silly me, I had not modified the fastq file. I did it and I finally got the consensus. I am now mapping some illumina reads again, I should see fewer indels (if any) in the alignments, is that right? Should I give another round of polishing or do you reckon 1 pacbio and 1 illumina polishing should be enough? Kind regards

rvaser commented 6 years ago

I would advise at least 2 times PacBio and at least 1 time Illumina polishing.

Best regards, Robert

AntoineHo commented 5 years ago

Hello, I have the same issue with ONT long reads... I overlap with minimap2: minimap2 -x map-ont -t 44 genome.fa reads.fq > genVont.paf Then I try Racon 1.3.1 (downloaded with conda): racon -t 44 -w 750 genome.fa genVont.paf reads.fq

I looked at reads name in the .fq file and I find no empty spaces, same for the genome.fa file. Then the output looks like:

[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
[racon::Polisher::initialize] loaded overlaps
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
rvaser commented 5 years ago

Hi Antoine, that error is quite intriguing. Are you by any chance running racon on a server with Slurm or something similar? Did you check that all sequences have unique identifiers? You can paste here a couple of headers.

Best regards, Robert

AntoineHo commented 5 years ago

Hello, I run locally on a desktop machine without slurm.

I could not find duplicated identifiers in reads nor in contigs, here are example IDs: Ctgs:

>scf7180000001958
>scf7180000001959
>scf7180000001960
>scf7180000001961
>scf7180000001962
>scf7180000001963
>scf7180000001964
>scf7180000001965
>scf7180000001966
>scf7180000001967

Reads:

@ch111_read10_template_pass_FAH43284
@ch108_read10_template_pass_FAH43284
@ch116_read10_template_pass_FAH43284
@ch110_read10_template_pass_FAH43284
@ch103_read10_template_pass_FAH43284
@ch123_read10_template_pass_FAH43284
@ch102_read10_template_pass_FAH43284
@ch1_read10_template_pass_FAH43284
@ch130_read10_template_pass_FAH43284
@ch135_read10_template_fail_FAH43284
rvaser commented 5 years ago

Do you mind sending me your data so I can investigate locally (contigs and reads)? I have no idea how this error could occur, probably some undefined behaviour.

AntoineHo commented 5 years ago

I will have to ask permission, this may take time. If I get it, I will comment on this issue, for the time being, I will try to reformat reads and contigs. Thanks for your help I will comment if I make it work.

rvaser commented 5 years ago

If you are willing you can help me debug it without sending me the data (it might even be faster). If so, please follow the following instructions: https://github.com/isovic/racon/issues/77#issuecomment-400757031.

AntoineHo commented 5 years ago

Ok, I got this result now, it seems all elements of the list return "not transmuted"

[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
[racon::Polisher::initialize] loaded overlaps
Not transmuted at 0/241317
Not transmuted at 1/241317
Not transmuted at 2/241317
Not transmuted at 3/241317
Not transmuted at 4/241317
Not transmuted at 5/241317
Not transmuted at 6/241317
--- a lot of not transmuted lines ---
Not transmuted at 241312/241317
Not transmuted at 241313/241317
Not transmuted at 241314/241317
Not transmuted at 241315/241317
Not transmuted at 241316/241317
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
rvaser commented 5 years ago

I don't understand how that is possible. I'll check the code again and report here soon. Thanks for testing!

AntoineHo commented 5 years ago

By the way, I tried reformating my reads file: Reads headers now look like:

>ch1537_read2552_template_pass_PAC16434

Contigs headers look like:

>scf7180000002512

Command to obtain the sam file with minimap2: minimap2 -t 44 -c -a -x map-ont genome.fasta ont_25x.fa > scfVont.sam Command to start racon: nohup ./racon/build/bin/racon -t 44 -w 750 genome.fasta scfVont.sam ont_25x.fa &

rvaser commented 5 years ago

Are there more of [racon::Overlap::find_breaking_points] error: overlap is not transmuted! lines or just 4-5?

AntoineHo commented 5 years ago

Just these 5 lines at the end.

Not transmuted at 241316/241317
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
rvaser commented 5 years ago

You are using the latest commit right? Nevermind, I see that you are using conda v1.3, I'll check there.

AntoineHo commented 5 years ago

I just cloned the repository to access files to modify and build on my own. I get the same kind of results with conda. racon --version outputs v1.3.1

I use minimap2 v2.14-r883 When I used conda racon version also outputs v1.3.1

rvaser commented 5 years ago

How many lines are in the overlap file (paf/sam)? Does it match with 241317 as reported in the log?

rvaser commented 5 years ago

Also please add

fprintf(stderr, "Missing query name %s\n, q_name_.c_str());

before https://github.com/isovic/racon/blob/master/src/overlap.cpp#L144, and

fprintf(stderr, "Missing target name %s\n, t_name_.c_str());

before https://github.com/isovic/racon/blob/master/src/overlap.cpp#L162. Hit make and rerun.

AntoineHo commented 5 years ago

I have 294072 lines in the sam file. (241317 in not transmuted output) The output looks exactly the same, I find no Missing target or query line.

rvaser commented 5 years ago

What is the size of the paf file?

rvaser commented 5 years ago

Next please try adding

fprintf(stderr, "Processing overlaps from %u to %zu\n", l, overlaps.size());

at https://github.com/isovic/racon/blob/master/src/polisher.cpp#L278, and

fprintf(stderr, "Overlap %u/%zu is not valid, deleting\n", i, overlaps.size());

before https://github.com/isovic/racon/blob/master/src/polisher.cpp#L284, and

    if (!overlaps[i]->is_valid()) {
        fprintf(stderr, "Not valid at %lu/%zu\n", i, overlaps.size());
    }

in the same for loop you added at line 330 before.

AntoineHo commented 5 years ago

The .paf file contains 291748 lines. I will add it then rerun

AntoineHo commented 5 years ago

I get:

[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
Processing overlaps from 1 to 269501
Overlap 0/269501 is not valid, deleting
Overlap 1/269501 is not valid, deleting
Overlap 2/269501 is not valid, deleting
Overlap 3/269501 is not valid, deleting
--
Overlap 269497/269501 is not valid, deleting
Overlap 269498/269501 is not valid, deleting
Overlap 269499/269501 is not valid, deleting
Overlap 269500/269501 is not valid, deleting
Processing overlaps from 1 to 22247
[racon::Polisher::initialize] loaded overlaps
Not transmuted at 0/22247
Not transmuted at 1/22247
Not transmuted at 2/22247
Not transmuted at 3/22247
Not transmuted at 4/22247
--
Not transmuted at 22243/22247
Not transmuted at 22244/22247
Not transmuted at 22245/22247
Not transmuted at 22246/22247
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
AntoineHo commented 5 years ago

I notice that number of overlaps is different when I use .paf and .sam output. Is it normal?

rvaser commented 5 years ago

I guess it should be equal. Or maybe some overlaps have bad alignments and are not printed, not sure though.

rvaser commented 5 years ago

Btw, the first print function in https://github.com/isovic/racon/issues/77#issuecomment-445385800 has lowecase L and not 1. Could you please rerun it again? :)

AntoineHo commented 5 years ago

Now I get this:

[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
Processing overlaps from 0 to 269501
Overlap 0/269501 is not valid, deleting
Overlap 1/269501 is not valid, deleting
Overlap 2/269501 is not valid, deleting
Overlap 3/269501 is not valid, deleting
--
Overlap 269497/269501 is not valid, deleting
Overlap 269498/269501 is not valid, deleting
Overlap 269499/269501 is not valid, deleting
Overlap 269500/269501 is not valid, deleting
Processing overlaps from 4294697795 to 22247
[racon::Polisher::initialize] loaded overlaps
Not transmuted at 0/22247
Not transmuted at 1/22247
Not transmuted at 2/22247
Not transmuted at 3/22247
Not transmuted at 4/22247
--
Not transmuted at 22241/22247
Not transmuted at 22242/22247
Not transmuted at 22243/22247
Not transmuted at 22244/22247
Not transmuted at 22245/22247
Not transmuted at 22246/22247
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
rvaser commented 5 years ago

A quick fix is to replace line https://github.com/isovic/racon/blob/master/src/polisher.cpp#L314 with

l = c < n ? 0 : c - n;

This will lead us to the real error which is ... empty overlap set! Can you please verify that?

AntoineHo commented 5 years ago

I got: [racon::Polisher::initialize] error: empty overlap set! Seems that is it!