isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
257 stars 48 forks source link

error: overlap is not transmuted! #205

Closed alelim-bio closed 2 years ago

alelim-bio commented 2 years ago

Hello Racon,

I have recently run into this error while attempting to polish some contigs using primary reads filtered by the "tp:A:P" tag. Furthermore, this is using a minimap alignment with customized parameters. To give some background information these are the commands I ran:

Minimap2 alignment time minimap2 -I 100G -c -t 96 -A 5 -B 1 -O 7,46 -E 28,13 -z 350,300 -s 220 ~/assembly/t2.dmo.lay.utg ~/raw_reads/ont/OS001288_PAG98861.pass.fastq.gz ~/raw_reads/ont/OS003D2E_PAH61301.pass.fastq.gz > pine_optimized_param_100G.base_level_alignment.paf

Racon polishing time racon -t 96 ~/raw_data/merged_fastq/merged_fastq_test_set.fastq.gz ~/raw_data/pine_optimized_param_100G.base_level_alignment.prim_only.paf ~/raw_data/contigs/random_sample_100.fa > racon_w_paf.optimized_parameters_I_100G.base_level_alignment.prim_only.fasta

I am currently, using racon v1.4.20 on a conda installation in a SLURM environment. While using the default parameters for the minimap2 alignment I have had no issues. Additionally, running racon on the unfiltered file also causes no issues. However, when I used my custom parameters racon is unable to run polishing with the filtered file.

I have been referencing issue #77 in order to help solve the issue but, to no avail. I have attempted to reorganize the racon input files and I have validated that the reads and contigs exist in the alignment file. But, I am unable to find what's causing the error.

I was hoping to get your input on this? Thank you for your time!

Kind Regards,

Alex

rvaser commented 2 years ago

Hi Alex, your alignment/polishing commands have missmatching file names. Can you please paste the matching Racon command?

Best regards, Robert

alelim-bio commented 2 years ago

Hello Robert,

Thank you for pointing that out! To clarify, I ran the following command to filter for primary reads from the minimap2 alignment found above and used the resulting output file for the above racon step: grep 'tp:A:P' ~/output/pine_optimized_param_100G.base_level_alignment.paf > ~/raw_data/pine_optimized_param_100G.base_level_alignment.prim_only.paf

Kind Regards,

Alex Lim

rvaser commented 2 years ago

I suppose you combined OS001288_PAG98861.pass.fastq.gz and OS003D2E_PAH61301.pass.fastq.gz into merged_fastq_test_set.fastq.gz?

You are mapping the reads to t2.dmo.lay.utg but polishing random_sample_100.fa. What is their relation?

alelim-bio commented 2 years ago

Hello Robert,

Yes sir, I combined them into a single file merged_fastq_test_set.fastq.gz. I actually wanted to ask you about this, will racon take a list of .fq.gz files or does it only take a single merged file?

As you mentioned I have t2.dmo.lay.utg and random_sample_100.fa. Random_sample_100.fa is a random subset of 100 contigs from t2.dmo.lay.utg. Just to clarify I am doing test runs before I apply the polishing to my entire dataset.

I hope this helps.

Kind Regards, Alex

rvaser commented 2 years ago

Due to argument order only one read file can be given to Racon. Can you please try the latest version by cloning the main repository https://github.com/lbcb-sci/racon?

alelim-bio commented 2 years ago

Hello Robert,

I took your advice and tested the latest version of Racon, it seems the polishing has now gone through!

Kind Regards, Alex