isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
268 stars 48 forks source link

layer begin and positions are invalid #140

Closed dcm9123 closed 4 years ago

dcm9123 commented 4 years ago

Hi! I am currently trying to run the following command: racon -u -f -w 50 -q 9 ../trim50_dem_q9/barcode02.fastq ../minimap2_output_x_map_ont/aln_2.sam ../gene_references/msp2_pf3d7_02026800.fasta

but I get the error: layer begin and positions are invalid

I am basically trying to get a set of fastq or fasta corrected sequences so I can tell the difference between sequencing errors from SNPs. My project consists of analysis of haplotypes, where SNP calling is crucial for this. The genes I sequenced using nanopore are short, around 550 bp and 800 bp (two different genes). I was looking at #118 and it seems that my error is a little bit different. I tried running it with both .sam and .paf and I am getting similar errors (ran an alignment using minimap2 previous to this, hence the .sam file I used).

However it works fine when I run this one: racon ../trim50_dem_q9/barcode01.fastq ../minimap2_output_x_map_ont/aln_01.paf ../gene_references/msp2_pf3d7_02026800.fasta > test , where I get a consensus fasta sequence, but I am looking forward to get a fasta set of corrected sequences.

Thanks for your help,

Daniel

rvaser commented 4 years ago

Hi Daniel, could you please run head -n 1 for each of the following files: barcode02.fastq, aln_2.sam and msp2_pf3d7_02026800.fasta?

Sorry for the wait.

Best regards, Robert

dcm9123 commented 4 years ago

No problem, here's the output:

head -n 1 barcode02.fastq @3a699ef0-8ee5-48a1-b2b3-4ff4d0e76d25 runid=7c6c69b0c1ae60abfe91b0d0c44ea2031ed4d1a4 read=6 ch=342 start_time=2019-08-01T00:25:12Z flow_cell_id=FAL13168 protocol_group_id=cpmp_and_msp2 sample_id=cpmp_and_msp2

head -n 1 aln_2.sam @SQ SN:plasmodb_pf3d7_msp2_0206800 LN:819

head -n 1 msp2_pf3d7_02026800.fasta

plasmodb_pf3d7_msp2_0206800

rvaser commented 4 years ago

Can you do the same for aln_2.paf?

dcm9123 commented 4 years ago

Sure, no problem:

3a699ef0-8ee5-48a1-b2b3-4ff4d0e76d25 701 18 691 - plasmodb_pf3d7_msp2_0206800 819 39 655 255 696 60 tp:A:P cm:i:32 s1:i:237 s2:i:0 dv:f:0.0864 rl:i:0

rvaser commented 4 years ago

That looks alright. Would you mind sharing the files so I can check what is wrong locally?

dcm9123 commented 4 years ago

Not at all! Just sent it to robert.vaser@fer.hr

Thanks!!!

rvaser commented 4 years ago

Hi Daniel, this issue should be fixed in latest commit (v1.4.8 at https://github.com/lbcb-sci/racon). Thanks for finding this bug!:)

Sorry for the wait. Best regards, Robert

dcm9123 commented 4 years ago

It is working now! Thanks a lot for the help! Looking at the latest commit, I assume that the bug was fixed for short amplicons (which is my case)? With the latest commit, should I be getting corrected fastq files by using the following command?

racon -u -f -w 50 -q 9 ../trim50_dem_q9/barcode02.fastq ../minimap2_output_x_map_ont/aln_2.sam ../gene_references/msp2_pf3d7_02026800.fasta

I am asking because I ran it like that, and I am getting what I believe it's a consensus fasta sequence, even though I added the -f flag, am I running something wrong to get my .fastq corrected fragments?

rvaser commented 4 years ago

If you want to correct each sequence in your barcode02.fastq file, you have to map those sequences to each other. Follow the commands bellow:

minimap2 -ax ava-ont --dual=yes barcode02.fastq barcode02.fastq > ava_2.sam
racon -u -f -w 50 -q 9 barcode02.fastq ava_2.sam barcode02.fastq > barcode02_polished.fasta
dcm9123 commented 4 years ago

Ok, thanks a lot! The minimap2 command took a while, but I guess that's because of my number of sequences that were mapped to each other. However, the racon command is throwing me the following error after running the suggested command:

[racon::Polisher::initialize] loaded target sequences 0.106122 s
[racon::Polisher::initialize] loaded sequences 0.065136 s
terminate called after throwing an instance of 'std::invalid_argument'
  what():  [bioparser::SamParser] error: invalid file format!
Aborted (core dumped)

do you know what is happening here?

rvaser commented 4 years ago

Hmmm, did you copy the above minimap2 command? Can you run head -n 10 ava_2.sam and tail -n 10 ava_2.sam?

dcm9123 commented 4 years ago

I did, plus I added a the -t to add my threads. The output looks like this: head -n 10 ava2.sam:

-bash-4.2$ head -n 10 ava2.sam 
@SQ SN:3a699ef0-8ee5-48a1-b2b3-4ff4d0e76d25 LN:701
@SQ SN:cffdbb37-253b-4cf6-9bf0-e9ef9925ce2a LN:95
@SQ SN:75512032-9812-4dd5-93d9-45d8fdb75705 LN:617
@SQ SN:4dbb5fef-bc18-467f-9ceb-63b03c1eee55 LN:176
@SQ SN:aa3c542a-04fb-4926-809e-87ea2b098525 LN:687
@SQ SN:be529844-6bd7-4427-b81a-e19435691f39 LN:726
@SQ SN:8a37475c-8ff9-42e9-bc65-960fa37c5d2a LN:653
@SQ SN:76a4299e-6c61-4c71-8a95-8c108f18d4f5 LN:113
@SQ SN:74ed9b39-6f68-4bac-91e3-7e8f6f4f9a7a LN:705
@SQ SN:b31c93fd-5157-4e17-950c-9094f3c93396 LN:76

the tail:

-bash-4.2$ tail -n 10 ava2.sam 
509cc2b7-4571-42c0-b248-33d1204f716b    256 75ca8e50-423f-4374-a142-1751fa5269bb    190 0   51M64D22M1I16M1D52M1D15M1I2M1D37M1I18M1D19M *   NM:i:75 ms:i:296    AS:i:310    nn:i:0  tp:A:S  cm:i:20 s1:i:125    de:f:0.0500 rl:i:192
509cc2b7-4571-42c0-b248-33d1204f716b    256 e7466576-2851-44e6-b2f8-9a05e027c008    71  0   50M22D60M1D32M1D10M1I4M1I2M1D18M2D18M2I18M19S   *   NM:i:39 ms:i:282    AS:i:284    nn:i:0  tp:A:S  cm:i:15 s1:i:103    de:f:0.0727 rl:i:192
509cc2b7-4571-42c0-b248-33d1204f716b    256 cdde8217-e1f8-41f1-bcf2-806c2e05fb94    76  0   7M1I15M2D14M2I10M2D25M26D16M1D8M2I37M1D5M1D10M1I4M1I2M1D37M2I36M    *   0   0   *   *   NM:i:49 ms:i:278    AS:i:284    nn:i:0  tp:A:S  cm:i:19 s1:i:110    de:f:0.0795 rl:i:192
509cc2b7-4571-42c0-b248-33d1204f716b    256 fdb0052a-6074-4a79-8c45-64e75827bdc8    627 0   51M9D2M12D37M1D4M2D43M1D5M1D10M1I4M1I2M1D9M3I26M2I11M24NM:i:38  ms:i:272    AS:i:272    nn:i:0  tp:A:S  cm:i:23 s1:i:111    de:f:0.0698 rl:i:192
509cc2b7-4571-42c0-b248-33d1204f716b    256 a208aa86-2310-4048-a2e1-91d73ddb2a12    30  0   48M1D5M1D37M1D4M40D48M1D15M1I2M1D33M42S *   0   NM:i:49 ms:i:246    AS:i:266    nn:i:0  tp:A:S  cm:i:22 s1:i:116    de:f:0.0503 rl:i:192
509cc2b7-4571-42c0-b248-33d1204f716b    256 de820508-015c-4fc3-855c-ae654470967f    86  0   49M1D6M1D15M40D20M1D52M1D15M3I8M2D9M1I19M1I18M19S   NM:i:58 ms:i:242    AS:i:262    nn:i:0  tp:A:S  cm:i:12 s1:i:106    de:f:0.0727 rl:i:192
509cc2b7-4571-42c0-b248-33d1204f716b    256 b610e46a-9a3a-4757-8429-a06703be069e    43  0   74M1D16M2D29M41I34M3D8M1D14M3D19M   *   0   NM:i:58 ms:i:220    AS:i:241    nn:i:0  tp:A:S  cm:i:14 s1:i:109    de:f:0.0650 rl:i:192
509cc2b7-4571-42c0-b248-33d1204f716b    256 47aaa6dd-5475-4d0c-ae06-8be13669f916    229 0   12S8M1D3M1D5M2D21M1D21M26D20M29D47M1D5M1D16M1I18M1I18M1D4M1D16M1D9M1I9M *   0   0   *   *   NM:i:79 ms:i:182    AS:i:197    nn:i:0  tp:A:S  cm:i:19 s1:i:103    de:f:0.1068 rl:i:192
a9d6ea3d-8eeb-4443-9872-a6e894370502    4   *   0   0   *   GTTATTAAATTGCTGGTGCTGTCGATTCCGTTTGTAGTCGTCTGTTTAACCTACTTGCCTGTCGCTCCTATCTTCCTTTGTACCATCAGGTACATTCT  '%&"#%%'0%&'+.48:?BGC@IBAGC:<BEF=-+5485/6<85@BBB>==2.,-*'+)1/.0&&*&79>@CFA,1,-,$D?DE0+8$%)5.@HBECA  rl:i:0
ac83e59b-db95-4fee-a2d9-96bdb72b16b4    4   *   0   0   *   TCAACTTGTATAATAGACAATGTTTTAATTACCTTCATGCAATATCAGCACCAACAGAAGGATTATGAACACGACTACTA;75-.00$77;?2,,$15LB;<?>4-/('+11310130=BI=3+(=699<JMT>><287%1$'%(((('$%',3474),=rl:i:0
rvaser commented 4 years ago

The sam file looks fine, but could you send it via email so I can see what is wrong?

dcm9123 commented 4 years ago

Sent! Thank you!

rvaser commented 4 years ago

Found the error, will update the parser soon.

rvaser commented 4 years ago

Bug fix is in the latest commit (v1.4.9) :)

dcm9123 commented 4 years ago

It works perfectly now! Thanks a lot!