isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
271 stars 49 forks source link

error: overlap is not transmuted! #77

Open jdmontenegro opened 6 years ago

jdmontenegro commented 6 years ago

Hi, I am trying to polish a PacBio assembly with illumina reads. After one round of polishing using the pacbio reads, I mapped the illumina reads to the the polished assembly with minimap2 and used the sam output as overlap information for polishing using the following commands:

minimap2 -t 64 -ax sr consensus1.fasta ${reads} > illumina.sam
racon -t 64 ${reads} illumina.sam consensus1.fasta > consensus2.fasta

The mapping works quite well, but after a few hours running I hit the following error:

[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
[racon::Polisher::initialize] loaded overlaps
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!

I am not sure what does it mean or how to fix this. Any suggestions are more than welcome! Kind regards,

rvaser commented 5 years ago

Great! Can you please copy here a few lines from the .paf file?

AntoineHo commented 5 years ago

Here are the first 20 lines: ont.head20.paf.txt

If you need more let me know :)

rvaser commented 5 years ago

That looks fine. Please add

fprintf(stderr, "Missing query name %s\n, q_name_.c_str());

at https://github.com/isovic/racon/blob/master/src/overlap.cpp#L139 and

fprintf(stderr, "Missing target name %s\n, t_name_.c_str());

at https://github.com/isovic/racon/blob/master/src/overlap.cpp#L157 (I pasted the wrong lines before).

P.S. Racon won't use the CIGAR string stored in PAF format, only in SAM.

AntoineHo commented 5 years ago

Ok, I have all of them missing it seems:

Overlap 22233/22247 is not valid, deleting
Missing query name ch2182_read139453_template_fail_PAC16434
Overlap 22234/22247 is not valid, deleting
Missing query name ch2182_read139453_template_fail_PAC16434
Overlap 22235/22247 is not valid, deleting
Missing query name ch1908_read107080_template_fail_PAC16434
Overlap 22236/22247 is not valid, deleting
Missing query name ch1908_read107136_template_fail_PAC16434
Overlap 22237/22247 is not valid, deleting
Missing query name ch776_read260672_template_fail_PAC16434
Overlap 22238/22247 is not valid, deleting
Missing query name ch1775_read58163_template_fail_PAC16434
Overlap 22239/22247 is not valid, deleting
Missing query name ch1775_read58163_template_fail_PAC16434
Overlap 22240/22247 is not valid, deleting
Missing query name ch2182_read139817_template_fail_PAC16434
Overlap 22241/22247 is not valid, deleting
Missing query name ch2182_read139817_template_fail_PAC16434
Overlap 22242/22247 is not valid, deleting
Missing query name ch1666_read101654_template_fail_PAC16434
Overlap 22243/22247 is not valid, deleting
Missing query name ch1768_read85887_template_fail_PAC16434
Overlap 22244/22247 is not valid, deleting
Missing query name ch1908_read107326_template_fail_PAC16434
Overlap 22245/22247 is not valid, deleting
Missing query name ch1659_read114573_template_fail_PAC16434
Overlap 22246/22247 is not valid, deleting
[racon::Polisher::initialize] error: empty overlap set!

When I grep reads uniq missing i get 97559 queries missing. On a total of 99426 reads (the difference left is, I guess, simply not mapping anywhere)

rvaser commented 5 years ago

I did not fully understand the last few sentences. Using grep you did not find the printed read names in the overlap file or?

AntoineHo commented 5 years ago

I just grepped names of reads in the stderr of racon and counted if all of them are missing. So 97559 queries are missing on 99426 reads in my .fa file

rvaser commented 5 years ago

Oh, now I get it. Try and grep the names in the paf file as well (just a few of them).

AntoineHo commented 5 years ago

The paf file contains 97559 different reads same number as missing queries.

rvaser commented 5 years ago

Do the names match between the paf file and the fasta file? Because the error racon encountered means they do not.

rvaser commented 5 years ago

Okay! You passed the read/genome files the other way around, I double checked it. The correct way to call racon is racon <reads> <overlaps> <contigs> :)

AntoineHo commented 5 years ago

Seriously :O ! Sorry for the inconvenience it's just plain stupid of me!! Well thank you I don't understand why I did not think of this before!

rvaser commented 5 years ago

No problem :) Thanks a lot for helping me out with the pesky not transmuted error!

SDA16 commented 5 years ago

Hi all, I got the same error (overlap are not transmuted!) in racon command... I'm working with Nanopore fatsq files. I run the Deepbinner for ONT barcode demultiplexing, then I run an assembly with wtdbg2 and as I wanted to polish my genome, I obtained the .sam, using bwa in conda environment, to put in racon command. I run: racon deepbinner.fastq.gz aligned_WTDBG2.sam deepbinner.fastq.gz > output1.fasta, and it worked. Then I wanted to polish the output, so I run racon output1.fasta aligned_WTDBG2.sam > output2.fasta but I got the following error:

[racon::Polisher::initialize] loaded target sequences [racon::Polisher::initialize] loaded sequences [racon::Polisher::initialize] loaded overlaps [racon::Overlap::find_breaking_points] error: overlap is not transmuted!

I'm completely new in using racon program and I don't understand how can I fix this error. Thank you in advance for help

Sara

rvaser commented 5 years ago

Hi Sara, could you please paste all commands you were using? The error is probably wrong ordering of racon input arguments.

Best regards, Robert

SDA16 commented 5 years ago

Dear rvaser, Thank you for your reply, here I attach the commands I used, starting from the assembly with wtdbg2:

wtdbg2 -x ont -genomesize  -i BC1.fastq.gz -t 6 -fo wg.sample1
wtpoa-cns -t 6 -i wg.sample1.ctg.lay.gz -fo sample1.ctg.fa
bwa index sample1.ctg.fa
bwa mem sample1.ctg.fa BC1.fastq.gz > aln-sample1.sam
racon -f BC1.fastq.gz aln-sample1.sam sample1.ctg.fa > racon1.sample1.fasta
racon -f racon1.sample1.fasta aln-sample1.sam sample1.ctg.fa > racon2.sample1.fasta

Best regards Sara

rvaser commented 5 years ago

Your second racon command is wrong (also missing a mapping step). Try the following:

bwa index sample1.ctg.fa
bwa mem sample1.ctg.fa BC1.fastq.gz > aln-sample1.sam
racon -f -t 6 BC1.fastq.gz aln-sample1.sam sample1.ctg.fa > racon1.sample1.fasta

bwa index racon1.sample1.fasta
bwa mem racon1.sample1.fasta BC1.fastq.gz > aln-sample2.sam
racon -f -t 6 BC1.fastq.gz aln-sample2.sam racon1.sample1.fasta > racon2.sample1.fasta

Best regards, Robert

P.S. Racon uses 1 thread by default (enable more with -t as above). Option -f (which you are using) will use all possible overlaps bwa found and racon will be slower than the default version, which only takes the best overlap per read. That is up to you though.

SDA16 commented 5 years ago

Thank you very much for your clear explanation, I will let you know if it will work! Kind regards, Sara

SDA16 commented 5 years ago

Dear Robert, Thank you very much for your help! Now all is working :) Best regards, Sara

Colorstorm commented 4 years ago

Hi, I also got the error(error: overlap is not transmuted!) after running racon on nanopore reads(ONT long reads). I used SLURMscripts for everything . For overlapping I used Minimap2(I tried it the first time with a .paf file, the second time with a .sam file) racon command: echo "Input reads: " $1 reads=$1 echo "Input overlaps: " $2 ovlps=$2 echo "Input conitgs: " $3 cntgs=$3 racon -t 48 $reads $ovlps $cntgs > $reads.sam.racon.fastq

racon output: Input reads: iddm.fastq Input overlaps: iddm.fastq.paf Input conitgs: iddm.ctg.fa [racon::Polisher::initialize] loaded target sequences 5.063910 s [racon::Polisher::initialize] loaded sequences 76.101899 s [racon::Polisher::initialize] loaded overlaps 155.985664 s [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted!

I tried the same script with the .sam file

Input reads: iddm.fastq Input overlaps: iddm.fastq.dual.sam Input conitgs: iddm.ctg.fa [racon::Polisher::initialize] loaded target sequences 4.936717 s [racon::Polisher::initialize] loaded sequences 64.061906 s [racon::Polisher::initialize] loaded overlaps 2869.280305 s [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted!

Here are my Minimap scripts: for the .paf file echo "Input file: " $1 fastq=$1 minimap2 -x ava-ont -t 48 -a $fastq $fastq > $fastq.paf

And the result:

Input file: ../../iddm.fastq [M::mm_idx_gen::101.9451.81] collected minimizers [M::mm_idx_gen::110.2933.57] sorted minimizers [M::main::110.2943.57] loaded/built the index for 414600 target sequence(s) [M::mm_mapopt_update::113.6923.49] mid_occ = 653 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 414600 [M::mm_idx_stat::115.6923.45] distinct minimizers: 201503003 (31.48% are singletons); average occurrences: 6.778; average spacing: 2.930 [M::worker_pipeline::158.78010.41] mapped 39197 sequences [M::worker_pipeline::180.28414.62] mapped 39391 sequences [M::worker_pipeline::209.28417.78] mapped 41758 sequences [M::worker_pipeline::232.55620.24] mapped 41173 sequences [M::worker_pipeline::258.28322.49] mapped 38831 sequences [M::worker_pipeline::284.09123.95] mapped 31375 sequences [M::worker_pipeline::307.51925.45] mapped 34915 sequences [M::worker_pipeline::325.40026.67] mapped 147452 sequences [M::worker_pipeline::341.81127.70] mapped 163866 sequences [M::worker_pipeline::359.16028.65] mapped 160113 sequences [M::worker_pipeline::374.87329.47] mapped 160627 sequences [M::worker_pipeline::390.77630.24] mapped 159259 sequences [M::worker_pipeline::406.77430.95] mapped 158131 sequences [M::worker_pipeline::422.39131.58] mapped 157935 sequences [M::worker_pipeline::438.44132.19] mapped 158613 sequences [M::worker_pipeline::454.54832.74] mapped 158987 sequences [M::worker_pipeline::470.15633.24] mapped 158144 sequences [M::worker_pipeline::485.72233.72] mapped 161848 sequences [M::worker_pipeline::500.95234.15] mapped 166190 sequences [M::worker_pipeline::514.70934.31] mapped 159601 sequences [M::mm_idx_gen::626.97728.49] collected minimizers [M::mm_idx_gen::630.15428.56] sorted minimizers [M::main::630.15428.56] loaded/built the index for 1277557 target sequence(s) [M::mm_mapopt_update::630.15428.56] mid_occ = 653 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 1277557 [M::mm_idx_stat::632.24728.47] distinct minimizers: 206534157 (31.14% are singletons); average occurrences: 6.585; average spacing: 2.941 [M::worker_pipeline::684.17928.27] mapped 39197 sequences [M::worker_pipeline::707.74828.83] mapped 39391 sequences [M::worker_pipeline::736.84029.25] mapped 41758 sequences [M::worker_pipeline::764.88929.64] mapped 41173 sequences [M::worker_pipeline::791.23930.07] mapped 38831 sequences [M::worker_pipeline::819.56530.40] mapped 31375 sequences [M::worker_pipeline::842.40830.79] mapped 34915 sequences [M::worker_pipeline::865.94131.25] mapped 147452 sequences [M::worker_pipeline::886.65431.64] mapped 163866 sequences [M::worker_pipeline::908.10632.02] mapped 160113 sequences [M::worker_pipeline::928.75432.38] mapped 160627 sequences [M::worker_pipeline::949.87532.73] mapped 159259 sequences [M::worker_pipeline::969.54633.05] mapped 158131 sequences [M::worker_pipeline::989.55633.35] mapped 157935 sequences [M::worker_pipeline::1011.77333.68] mapped 158613 sequences [M::worker_pipeline::1032.11833.95] mapped 158987 sequences [M::worker_pipeline::1052.98534.23] mapped 158144 sequences [M::worker_pipeline::1072.66334.48] mapped 161848 sequences [M::worker_pipeline::1092.60934.73] mapped 166190 sequences [M::worker_pipeline::1109.62234.83] mapped 159601 sequences [M::mm_idx_gen::1167.25933.20] collected minimizers [M::mm_idx_gen::1168.84633.21] sorted minimizers [M::main::1168.84633.21] loaded/built the index for 645249 target sequence(s) [M::mm_mapopt_update::1168.84633.21] mid_occ = 653 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 645249 [M::mm_idx_stat::1170.57033.16] distinct minimizers: 162491418 (39.95% are singletons); average occurrences: 4.072; average spacing: 2.942 [M::worker_pipeline::1201.72433.12] mapped 39197 sequences [M::worker_pipeline::1219.02933.30] mapped 39391 sequences [M::worker_pipeline::1242.38233.32] mapped 41758 sequences [M::worker_pipeline::1267.73633.30] mapped 41173 sequences [M::worker_pipeline::1284.95433.42] mapped 38831 sequences [M::worker_pipeline::1307.85833.48] mapped 31375 sequences [M::worker_pipeline::1325.48933.61] mapped 34915 sequences [M::worker_pipeline::1340.83233.76] mapped 147452 sequences [M::worker_pipeline::1354.54033.90] mapped 163866 sequences [M::worker_pipeline::1369.11634.05] mapped 160113 sequences [M::worker_pipeline::1382.95934.19] mapped 160627 sequences [M::worker_pipeline::1396.19534.33] mapped 159259 sequences [M::worker_pipeline::1410.70834.46] mapped 158131 sequences [M::worker_pipeline::1424.43734.59] mapped 157935 sequences [M::worker_pipeline::1438.54734.71] mapped 158613 sequences [M::worker_pipeline::1453.87734.81] mapped 158987 sequences [M::worker_pipeline::1467.33434.93] mapped 158144 sequences [M::worker_pipeline::1481.53835.06] mapped 161848 sequences [M::worker_pipeline::1494.52735.18] mapped 166190 sequences [M::worker_pipeline::1506.871*35.22] mapped 159601 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -x ava-ont -t 48 iddm.fastq iddm.fastq

The Minimap script for the .sam file echo "Input file: " $1 fastq=$1 minimap2 -t 48 -x ava-ont --dual=yes -a $fastq $fastq > $fastq.dual.sam

And the result:

Input file: iddm.fastq [M::mm_idx_gen::101.9131.89] collected minimizers [M::mm_idx_gen::110.7413.71] sorted minimizers [WARNING] For a multi-part index, no @SQ lines will be outputted. Please use --split-prefix. [M::main::110.7423.71] loaded/built the index for 414600 target sequence(s) [M::mm_mapopt_update::114.4413.63] mid_occ = 653 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 414600 [M::mm_idx_stat::116.5383.58] distinct minimizers: 201503003 (31.48% are singletons); average occurrences: 6.778; average spacing: 2.930 [M::worker_pipeline::4249.33437.26] mapped 39197 sequences [M::worker_pipeline::8132.25836.54] mapped 39391 sequences [M::worker_pipeline::12522.16534.18] mapped 41758 sequences [M::worker_pipeline::15639.89635.54] mapped 41173 sequences [M::worker_pipeline::19105.06536.24] mapped 38831 sequences [M::worker_pipeline::22503.35536.59] mapped 31375 sequences [M::worker_pipeline::26454.16636.32] mapped 34915 sequences [M::worker_pipeline::28441.04736.07] mapped 147452 sequences [M::worker_pipeline::30503.55235.34] mapped 163866 sequences [M::worker_pipeline::31750.36735.54] mapped 160113 sequences [M::worker_pipeline::33022.42035.79] mapped 160627 sequences [M::worker_pipeline::34290.97235.99] mapped 159259 sequences [M::worker_pipeline::35530.10936.12] mapped 158131 sequences [M::worker_pipeline::36705.06936.31] mapped 157935 sequences [M::worker_pipeline::37924.02436.51] mapped 158613 sequences [M::worker_pipeline::39391.11436.46] mapped 158987 sequences [M::worker_pipeline::40467.80836.69] mapped 158144 sequences [M::worker_pipeline::41510.01236.87] mapped 161848 sequences [M::worker_pipeline::42643.36536.96] mapped 166190 sequences [M::worker_pipeline::43728.49736.99] mapped 159601 sequences [M::mm_idx_gen::43843.67536.89] collected minimizers [M::mm_idx_gen::43846.83036.89] sorted minimizers [M::main::43846.83036.89] loaded/built the index for 1277557 target sequence(s) [M::mm_mapopt_update::43846.83036.89] mid_occ = 653 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 1277557 [M::mm_idx_stat::43848.85236.89] distinct minimizers: 206534157 (31.14% are singletons); average occurrences: 6.585; average spacing: 2.941 [M::worker_pipeline::45445.58937.01] mapped 39197 sequences [M::worker_pipeline::46775.46837.13] mapped 39391 sequences [M::worker_pipeline::48483.44136.98] mapped 41758 sequences [M::worker_pipeline::49759.82337.15] mapped 41173 sequences [M::worker_pipeline::51301.82137.17] mapped 38831 sequences [M::worker_pipeline::52551.04437.33] mapped 31375 sequences [M::worker_pipeline::54093.70637.28] mapped 34915 sequences [M::worker_pipeline::54944.79037.39] mapped 147452 sequences [M::worker_pipeline::55681.23037.52] mapped 163866 sequences [M::worker_pipeline::56508.20937.58] mapped 160113 sequences [M::worker_pipeline::57399.81337.61] mapped 160627 sequences [M::worker_pipeline::58213.00037.67] mapped 159259 sequences [M::worker_pipeline::59014.91237.72] mapped 158131 sequences [M::worker_pipeline::59787.85237.78] mapped 157935 sequences [M::worker_pipeline::60531.22637.87] mapped 158613 sequences [M::worker_pipeline::61327.12137.92] mapped 158987 sequences [M::worker_pipeline::62010.62038.02] mapped 158144 sequences [M::worker_pipeline::62687.75338.09] mapped 161848 sequences [M::worker_pipeline::63383.41838.15] mapped 166190 sequences [M::worker_pipeline::63976.31838.21] mapped 159601 sequences [M::mm_idx_gen::64036.48238.18] collected minimizers [M::mm_idx_gen::64038.10438.18] sorted minimizers [M::main::64038.10438.18] loaded/built the index for 645249 target sequence(s) [M::mm_mapopt_update::64038.10438.18] mid_occ = 653 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 645249 [M::mm_idx_stat::64039.83738.18] distinct minimizers: 162491418 (39.95% are singletons); average occurrences: 4.072; average spacing: 2.942 [M::worker_pipeline::66075.36938.23] mapped 39197 sequences [M::worker_pipeline::67810.32738.30] mapped 39391 sequences [M::worker_pipeline::69696.91538.24] mapped 41758 sequences [M::worker_pipeline::71261.79238.38] mapped 41173 sequences [M::worker_pipeline::72985.90438.45] mapped 38831 sequences [M::worker_pipeline::74617.16638.54] mapped 31375 sequences [M::worker_pipeline::76365.99138.61] mapped 34915 sequences [M::worker_pipeline::77291.26438.69] mapped 147452 sequences [M::worker_pipeline::78097.93838.77] mapped 163866 sequences [M::worker_pipeline::78874.29638.85] mapped 160113 sequences [M::worker_pipeline::79737.23438.90] mapped 160627 sequences [M::worker_pipeline::80548.22538.97] mapped 159259 sequences [M::worker_pipeline::81334.74439.02] mapped 158131 sequences [M::worker_pipeline::82109.85339.07] mapped 157935 sequences [M::worker_pipeline::82934.34339.11] mapped 158613 sequences [M::worker_pipeline::83724.46239.16] mapped 158987 sequences [M::worker_pipeline::84487.13239.22] mapped 158144 sequences [M::worker_pipeline::85176.92039.28] mapped 161848 sequences [M::worker_pipeline::85901.95639.32] mapped 166190 sequences [M::worker_pipeline::86616.706*39.33] mapped 159601 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -t 48 -x ava-ont --dual=yes -a iddm.fastq iddm.fastq [M::main] Real time: 86618.862 sec; CPU: 3406730.278 sec; Peak RSS: 131.710 GB

I hope you can help me, and thanks a lot in advance. Best regards Fabian

rvaser commented 4 years ago

Hello Fabian, if you want to polish your contigs, you have to run minimap2 and racon with the following commands:

minimap2 -x map-ont -t 48 iddm.ctg.fa iddm.fastq > ovl.paf
racon -t 48 iddm.fastq ovl.paf iddm.ctg.fa > polished.ctg.fasta

If you want to error correct reads, run the following:

minimap2 -t 48 -ax ava-ont --dual=yes iddm.fastq iddm.fastq > dual.sam
racon -t 48 iddm.fastq dual.sam iddm.fastq > polished.reads.fasta

Sorry for the late reply! Best regards, Robert

Colorstorm commented 4 years ago

Thanks a lot, I just need the second(correct reads) and try it now. I'll let you know if it worked.

Best regards Fabian

Colorstorm commented 4 years ago

Thanks so much, everything worked fine.

emmannaemeka commented 4 years ago

I seem to have the same error with ONT data

Error [racon::Polisher::initialize] loaded target sequences 6.005934 s [racon::Polisher::initialize] loaded sequences 759.764594 s [racon::Polisher::initialize] loaded overlaps 780.298724 s [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted!

SAM line

@SQ SN:contig_141_pilon LN:1358375 @SQ SN:contig_142_pilon LN:67648 @SQ SN:contig_143_pilon LN:5159672 @SQ SN:contig_144_pilon LN:35530 @SQ SN:contig_147_pilon LN:40799 @SQ SN:contig_148_pilon LN:2807903 @SQ SN:contig_149_pilon LN:47893 @SQ SN:contig_15_pilon LN:7660417 @SQ SN:contig_150_pilon LN:960 @SQ SN:contig_151_pilon LN:3459730 @SQ SN:contig_153_pilon LN:1283258 @SQ SN:contig_154_pilon LN:9233410 @SQ SN:contig_155_pilon LN:466968 @SQ SN:contig_156_pilon LN:146389 @SQ SN:contig_157_pilon LN:32373 @SQ SN:contig_158_pilon LN:87654 @SQ SN:contig_16_pilon LN:3679621 @SQ SN:contig_160_pilon LN:525355 @SQ SN:contig_165_pilon LN:42875

I ran this commandline racon -m 8 -x -6 -g -8 -w 500 -t 30 ~20190404.fq ~/aln.sam ~/assembly.fasta > ~/racon1x.fa

rvaser commented 4 years ago

Hello, could you please provide the command you used to get the aln.sam file?

Best regards, Robert

emmannaemeka commented 4 years ago

bwa index fasta bwa mem bwa mem -t 36 -x path_to_index Path_to_long_read(Nanopore).fq > output.sam

rvaser commented 4 years ago

Looks alright. Did you maybe put the wrong long read file or assembly file? The error you encountered indicates that Racon could not find either contigs of the assembly or any of the reads, or both. Can you please paste first line of 20190404.fq, first line of assembly.fasta and first line which does not start with @ from aln.sam?

11Dmitriy11 commented 4 years ago

I tried to polish my ONT miniasm assembly by Illumina reads 4 times, but I get error: "overlap is not transmuted" after loaded input files to racon

reads1="SRR6880005_1_fixed.fastq"
reads2="SRR6880005_2_fixed.fastq"
target="miniasm_coluzzii.fa"
cat ${reads1} ${reads2} > SRR6880005_12_fixed.fastq
total='SRR6880005_12_fixed.fastq'

for (( i=1; i <= 4; i++ ))
do
if [ -s bwa_coluzzii_${i}.sam ]
then
echo 'bwa is already done'
else
bwa index ${target}
bwa mem ${target} ${reads1} ${reads2} > bwa_coluzzii_${i}.sam
fi
align="bwa_coluzzii_${i}.sam"
racon  -f -t 46  ${total} ${align} ${target} >miniasm_coluzzi_polish_${i}.fa

target="miniasm_coluzzi_polish_${i}.fa"
done
rvaser commented 4 years ago

Hello, please paste the output of head -n 1 SRR6880005_*.fastq, head -n 1 miniasm_coluzzii.fa and grep "^[^@]" bwa_coluzzii_1.sam | head -n 1".

Best regards, Robert

11Dmitriy11 commented 4 years ago

head -n 1 SRR6880005_*.fastq ==> SRR6880005_12_fixed.fastq <== @SRR6880005.2/1

==> SRR6880005_1_fixed.fastq <== @SRR6880005.2/1

==> SRR6880005_2_fixed.fastq <== @SRR6880005.2/2

head -n 1 miniasm_coluzzii.fa

utg000001l

grep "^[^@]" bwa_coluzzii_1.sam | head -n 1 SRR6880005.2 99 utg000473l 94174 57 23M2D32M2I43M = 94174 100 GGTAAATTGAGTACCATTATCAGACACGAGAACTTCTGGCACTCCGAAAGTTGCGAAAATTTGTTTCAAAATTCTTATTGTTGTTCTCGCAGTTATTGAT AAAAFFJFJJJJJJJJJJJJJJJJJJFJFFJJJJJJJJJFAJAFJFJAFJJFAJJJFJJJ<AJ<JJJJJJJJFFFFA<FJJJJJ7JJJJJJJ<FJJJJJA NM:i:6 MD:Z:0A14G7^AC75 MC:Z:23M2D32M2I43M AS:i:76 XS:i:68 XA:Z:utg000444l,-99101,12S44M2D7M1D37M,4;utg000444l,+88866,34M2I26M3D38M,8;utg000162l,+139394,37M1I4M1I12M1D45M,6;

rvaser commented 4 years ago

As it seems, BWA removed /1 and /2 from your sequence headers in the SAM file, which hinders Racon to connect the sequence and alignment files. Try renaming your sequences as SRR6880005.21 and SRR6880005.22. Afterwards, run BWA again.

11Dmitriy11 commented 4 years ago

Thanks for the quick reply! Seems that nothing changes in SAM file: SRR6880005.2 99 utg000473l 94174 57 23M2D32M2I43M = 94174 100 GGTAAATTGAGTACCATTATCAGACACGAGAACTTCTGGCACTCCGAAAGTTGCGAAAATTTGTTTCAAAATTCTTATTGTTGTTCTCGCAGTTATTGAT AAAAFFJFJJJJJJJJJJJJJJJJJJFJFFJJJJJJJJJFAJAFJFJAFJJFAJJJFJJJ<AJ<JJJJJJJJFFFFA<FJJJJJ7JJJJJJJ<FJJJJJA NM:i:6 MD:Z:0A14G7^AC75 MC:Z:23M2D32M2I43M AS:i:76 XS:i:68 XA:Z:utg000444l,-99101,12S44M2D7M1D37M,4;utg000444l,+88866,34M2I26M3D38M,8;utg000162l,+139394,37M1I4M1I12M1D45M,6;

Should I replace .2 and .1 in header names to /2 /1 ? Best regards, Dmitriy

jdcny24 commented 2 years ago

i tried to use racon to polish my assembly but i got this error:

[racon::Polisher::initialize] loaded target sequences 9.923656 s [racon::Polisher::initialize] loaded sequences 2401.086862 s [racon::Polisher::initialize] loaded overlaps 1721.202367 s [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted!

the commands i used to get to this point are:

time minimap2 -t 144 -ax map-ont Assembly.fasta bonito_basecall.fastq > bonito.sam time racon -t 144 bonito_basecall.fastq bonito.sam Assembly.fasta > bonito.racon.fasta

i don't think that i made a mistake in the order of the arguments as i have used the exact same command for other samples. i just seem to only get this error for one of them.

rvaser commented 2 years ago

The commands look fine. Which Racon version are you using?

jdcny24 commented 2 years ago

hi @rvaser, i'm using version 1.4.3. also i made an error in my previous post.

if i use .sam in my command time racon -t 144 bonito_basecall.fastq bonito.sam Assembly.fasta > bonito.racon.fasta, my error message is:

[racon::Polisher::initialize] loaded target sequences 9.859341 s [racon::Polisher::initialize] loaded sequences 2319.713921 s terminate called after throwing an instance of 'std::invalid_argument' what(): [bioparser::SamParser] error: invalid file format!

if i use .paf in my command time racon -t 144 bonito_basecall.fastq bonito.paf Assembly.fasta > bonito.racon.fasta, my error message is:

[racon::Polisher::initialize] loaded target sequences 9.923656 s [racon::Polisher::initialize] loaded sequences 2401.086862 s [racon::Polisher::initialize] loaded overlaps 1721.202367 s [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted! [racon::Overlap::find_breaking_points] error: overlap is not transmuted!

i did it both ways as i had seen in a previous comment to mv .sam to .paf so thought it might work.

rvaser commented 2 years ago

Can you please try the latest version (1.4.20)?

jdcny24 commented 2 years ago

hi @rvaser i updated racon to version 1.4.22.

with bonito.sam in my command, the error message is:

[racon::Polisher::initialize] loaded target sequences 9.443693 s [racon::Polisher::initialize] loaded sequences 2362.990930 s terminate called after throwing an instance of 'std::invalid_argument' what(): [bioparser::SamParser] error: invalid file format

with bonito.paf in my command, the error message is:

[racon::Polisher::initialize] loaded target sequences 9.365726 s [racon::Polisher::initialize] loaded sequences 2395.195146 s [racon::Overlap::transmute] error: unequal lengths in sequence and overlap file for sequence 720cf50f-6b57-42f7-9093-1ca627cf2077!

i did not have any errors when running minimap2 so not quite sure why there would be unequal lengths.

rvaser commented 2 years ago

Sorry for my late reply. If you generate a .sam file with minimap2 (using option -a) you cannot just do mv .sam .paf, run minimap2 without -a. Maybe your .sam file is truncated or something. Please try generating the .paf file and try again. If you can, you can also send me the data so I can investigate locally.