isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
257 stars 48 forks source link

error: empty overlap set! and multiple consusnses sequnces #222

Open sandaruwanrat opened 1 year ago

sandaruwanrat commented 1 year ago

Hello,

My question has two parts

1.When I run the command

racon -m 8 -x -6 -g -8 -w 500 cluster_1329.fasta cluster_1329_ovlp_mapping_test_fwd.paf TAIR10_chr_all.fasta > cluster_1329_tmp_consensus_test3.fasta I am getting the following error. [racon::Polisher::initialize] loaded target sequences [racon::Polisher::initialize] loaded sequences [racon::Polisher::initialize] error: empty overlap set!

However, I only get this for some of the cluster files others works fine.

Below are my minmap2 commands, i have used both paf and sam formats minimap2 -x map-ont -t 1 -uf TAIR10_chr_all.fasta cluster_1329.fasta > cluster_1329_ovlp_mapping_test_fwd.paf

minimap2 -ax map-ont -t 1 -uf TAIR10_chr_all.fasta cluster_1329.fasta > cluster_1329_ovlp_mapping_test_fwd.sam

2. I get multiple consensus sequences

As I mentioned in the part one, racon generates consensus sequences for some cluster files for the same command

racon -m 8 -x -6 -g -8 -w 500 cluster_9.fasta cluster_9_ovlp_mapping_test_fwd.paf TAIR10_chr_all.fasta > cluster_9_tmp_consensus_test3.fasta But the problem is there are more than one sequence (I have put example below) in the output

`>chr1 LN:i:30427560 RC:i:155 XC:f:0.000049 Sequnce

chr5 LN:i:30427560 RC:i:155 XC:f:0.000049 Sequnce `

I would greatly appreciate your feedback on this. Thank you very much.

rvaser commented 1 year ago

Hello,

  1. please verify if cluster_1329_ovlp_mapping_test_fwd.paf is empty.
  2. how many sequences are there in the target file TAIR10_chr_all.fasta? Not sure I understand what you are trying to achieve.

Best regards, Robert

sandaruwanrat commented 1 year ago

Hello Robert,

1) Both .paf and .sam files are not empty files.

2) TAIR10_chr_all.fasta is a genome file of Arabidopsis. It has five contigs. Following are the length of each contig.

chr1 30427671 chr2 19698289 chr3 23459830 chr4 18585056 chr5 26975502 chrM 367808

My aim is to collapse sequences in the each cluster file (ex: cluster_9.fasta) and get a consensus sequence. Each of these sequences in a "cluster_XXX.fasta" file should belongs to same genomic region. I would like to know if I am doing something wrong.

Thank you.

Best Regards Sandaruwan

rvaser commented 1 year ago

How big are the genomic regions of each cluster?

sandaruwanrat commented 1 year ago

It varies. The mean length of some are 250 bp, 950bp 1.1kb, basically I have different clusters from 250bp to 2.5 kb

rvaser commented 1 year ago

And how did you obtain the clusters? You might try https://github.com/rvaser/spoa instead of Racon.

sandaruwanrat commented 1 year ago

I obtained clusters based on UMIs, I have used https://github.com/fhlab/UMIC-seq to get the clusters. But https://github.com/SorenKarst/longread_umi/blob/master/scripts/consensus_racon.sh have used racon to get consensus sequences from clusters.

I will try spoa instead of racon.

Thank you very much.

Best regards Sandaruwan