isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
268 stars 48 forks source link

empty overlap set! #103

Open Adelaam opened 5 years ago

Adelaam commented 5 years ago

Hello,

I am trying Racon for first time. I am getting this error when I try to polish my data. I have tried with the files provided for the test and I get same error.

[racon::Polisher::initialize] loaded target sequences [racon::Polisher::initialize] loaded sequences [racon::Polisher::initialize] error: empty overlap set!

Thank you very much.

rvaser commented 5 years ago

Hi, can you paste the command you are using?

Best regards, Robert

Adelaam commented 5 years ago

'/home/markwilks/racon/build/bin/racon' -t 4 '/home/markwilks/Desktop/M.abscessus/barcode02/barcode02new/barcode02silter2kbq7.fastq.gz' '/home/markwilks/Desktop/M.abscessus/barcode02/barcode02new/barcode02newminimap/barcode02newoverlap.paf.gz' '/home/markwilks/Desktop/M.abscessus/barcode02/barcode02new/barcode02newminimap/barcode02newminimap.contigs.fa' [racon::Polisher::initialize] loaded target sequences [racon::Polisher::initialize] loaded sequences [racon::Polisher::initialize] error: empty overlap set! I called the second overlap but is the one I got from minimap2 (reads-assembly). BW,

rvaser commented 5 years ago

Could you please copy the minimap2 command you used to get the .paf file?

Adelaam commented 5 years ago

'/home/markwilks/minimap2-2.13_x64-linux/minimap2' -t 4 -map-ont '/home/markwilks/Desktop/M.abscessus/barcode02/barcode02new/barcode02newminimap/barcode02newminimap.contigs.fa' '/home/markwilks/Desktop/M.abscessus/barcode02/barcode02new/barcode02silter2kbq7.fastq.gz' | gzip -1 > '/home/markwilks/Desktop/M.abscessus/barcode02/barcode02new/barcode02newminimap/barcode02newoverlap.paf.gz'

Adelaam commented 5 years ago

Yeah, I did a mistake with the output got from minimpa2.

Thank you!

apoosakkannu commented 5 years ago

Hi, I am encountering similar problem, empty overlap set. I am very new to the metagenomics as well as racon. My commands are following,

minimap2 -x map-ont 98ZLc_assembly_NP.fasta 98ZLc_nanopore_filt.fastq > 98ZLc_assembly_minimap_NP.paf

racon 98ZLc_nanopore_filt.fastq 98ZLc_assembly_minimap_NP.paf 98ZLc_nanopore_filt_ava_assembly.fasta > \ 98ZLc_nanopore_filt_racon.fasta

could you please help me to resolve this problem?

Thanks.

rvaser commented 5 years ago

Hi, it appears that you misplaced the reference file, i.e. in the minimap2 command you are mapping reads to 98ZLc_assembly_NP.fasta and in the racon command you want to polish 98ZLc_nanopore_filt_ava_assembly.fasta. Those two files should match.

Best regards, Robert

apoosakkannu commented 5 years ago

Thanks for your kind reply. It has worked. Now i am getting slightly different following error,

minimap2 -x sr 98ZLc_nanopore_assembly_racon_metka2.fasta 98ZLc_combineraw_IL.fastq > 98ZLc_mini_IL.paf

racon 98ZLc_nanopore_raw.fastq 98ZLc_mini_IL.paf 98ZLc_nanopore_assembly_racon_metka2.fasta > 98ZLc_nanopore_assembly_racon_metka2_raconIL.fa [racon::Polisher::initialize] loaded target sequences 1.170 s [racon::Polisher::initialize] loaded sequences 152.444 s [racon::Polisher::initialize] loaded overlaps 116.237 s [racon::Overlap::find_breaking_points] error: overlap is not transmuted!

Could you please give some insight on this error and any possibility to overcome.

rvaser commented 5 years ago

It appears again that you misplaced the files :D You are mapping 98ZLc_combineraw_IL.fastq to your assembly with minimap2 and in racon you are using different reads 98ZLc_nanopore_raw.fastq. These two files should also match :)

apoosakkannu commented 5 years ago

my bad. Thanks. But again some error so, anbu@bubulin:~/polyplax/downstreamanalysiswithvaclavassembly$ racon -t 19 98ZLc_combineraw_IL.fastq 98ZLc_mini_IL.paf 98ZLc_nanopore_assembly_racon_metka2.fasta > 98ZLc_nanopore_assembly_racon_metka2_raconIL.fa [racon::Polisher::initialize] loaded target sequences 1.175 s [racon::Polisher::initialize] loaded sequences 200.535 s [racon::Overlap::transmute] error: unequal lengths in sequence and overlap file for sequence A00419:60:H7KWLDRXX:2:2101:3522:1016!

rvaser commented 5 years ago

Was the Illumina file 98ZLc_combineraw_IL.fastq created by joining 2 paired end files?

apoosakkannu commented 5 years ago

yes, it is !

rvaser commented 5 years ago

The error is probably that reads from a pair have equal names up to the first white space. You can add 1 to the first read and 2 to the second read name. Here is a script I wrote a while back: https://github.com/isovic/racon/issues/68#issuecomment-386223150. Usage is here: https://github.com/isovic/racon/issues/68#issuecomment-453412183.

apoosakkannu commented 5 years ago

thanks. so i made script.py file and saved it my working directory and ran the script. got the following error, anbu@bubulin:~/polyplax/downstreamanalysiswithvaclavassembly$ python script.py 98ZLc_combineraw_IL.fastq Traceback (most recent call last): File "script.py", line 46, in if (valid): NameError: name 'valid' is not defined

am i doing something wrong?

rvaser commented 5 years ago

Hmmm that is weird, I tried with both python2 and python3 and it works.

apoosakkannu commented 5 years ago

i have fastq file, is it ok

rvaser commented 5 years ago

Yeah, It should be a fastq file. Did you by any chance mess up the indentation in the script?

apoosakkannu commented 5 years ago

i just copied it. ii have not changed anything.

apoosakkannu commented 5 years ago

if you have script.py file, could you please attach it. I will try with it.

rvaser commented 5 years ago

There you go.

rename.zip

apoosakkannu commented 5 years ago

Thanks. It is running now. i guess it will run for a while?

rvaser commented 5 years ago

If you have a big file it might take a while :)

apoosakkannu commented 5 years ago

it is 22.1 gb, i guess big enough.

rvaser commented 5 years ago

Indeed it is. I hope you run the command with piping to a new file!

apoosakkannu commented 5 years ago

no i did not do, i am new to scripting. could you give me how to do. it is running on the screen itself.

rvaser commented 5 years ago

python rename.py 98ZLc_combineraw_IL.fastq > 98ZLc_renamed_IL.fastq

apoosakkannu commented 5 years ago

thanks.

apoosakkannu commented 5 years ago

do i need to do the minimap with the renamed file before polishing with racon?

rvaser commented 5 years ago

Unfortunately yes.

apoosakkannu commented 5 years ago

great, thanks.

apoosakkannu commented 5 years ago

Hi, I used racon to polish my nanopore sequence assembly by mapping the Illumina sequences. As we discussed above i need to rename my illumina sequences. I used the renamed sequences for polishing my nanopore assembly. The process was done without any error in racon. But I used the polished assembly output to make coverage profile using minimap and samtools. Here comes the problem, my scaffolds are not making any coverage profile for both nanopore and illumina. I wonder could it be problem related to renaming the file? Could you give me an idea for solving this problem.

rvaser commented 5 years ago

Not sure what could be wrong. Can you paste all the commands you were using?

apoosakkannu commented 5 years ago

Please find the following commands,

minimap2 -x sr 98ZLc_nanopore_assembly_racon_metka2.fasta 98ZLc_combineraw__renamed_IL.fastq > 98ZLc_mini_renamed_IL.paf

racon 98ZLc_combineraw_renamed_IL.fastq 98ZLc_mini_renamed_IL.paf 98ZLc_nanopore_assembly_racon_metka2.fasta > 98ZLc_nanopore_assembly_racon_metka2_racon2.fa

rvaser commented 5 years ago

Is the file 98ZLc_nanopore_assembly_racon_metka2_racon2.fa empty by any chance?

apoosakkannu commented 5 years ago

no. they seem to be normal as earlier assembly.

apoosakkannu commented 5 years ago

I mean size of the file is similar to 98ZLc_nanopore_assembly_racon_metka2.fasta.

rvaser commented 5 years ago

Did you run coverage profiles on the unpolished assembly file and it worked? Did you change any parameters in those commands?

apoosakkannu commented 5 years ago

Yes, I tried with 98ZLc_nanopore_assembly_racon_metka2.fasta and it worked. I have not changed any parameters.

rvaser commented 5 years ago

Are you using different files now? Can you copy these commands as well?

apoosakkannu commented 5 years ago

minimap2 -t 19 -ax map-ont 98ZLc_nanopore_assembly_racon_metka2.fasta 98ZLc_nanopore_raw.fastq | samtools view --threads 19 -Sb -F 0x104 - | samtools sort --threads 19 - > np_cov.bam samtools depth -aa np_cov.bam | awk -F "\t" '{a[$1] += $3; b[$1]++} END{OFS = ","; for (i in a) print i, a[i]/b[i]}' > np_cov.csv

rvaser commented 5 years ago

And you just replaced 98ZLc_nanopore_assembly_racon_metka2.fasta with 98ZLc_nanopore_assembly_racon_metka2_racon2.fa and it did not work? Any error message? Did you check that the path is alright?

apoosakkannu commented 5 years ago

Yes, I just replaced it. There was no error. I am using them in same folder, so path should not be a problem. The output is similar to others. But when i use .csv file in mmgenome2, Rstudio, then i came to know they are not creating the coverage as others.

rvaser commented 5 years ago

No idea then :/

apoosakkannu commented 5 years ago

ok, thanks for your help. I will check it out.

apoosakkannu commented 5 years ago

Hi, I wonder could it be possible to modify your script (rename.zip) to change the name of the fasta file? if possible please give me some info for modification. Thanks in advance.

rvaser commented 5 years ago

Do you want to change sequence names or the actual file that is being produced?

apoosakkannu commented 5 years ago

I wonder the error problem in creating the coverage file in my polished assembly using illumina could be due to the rename of the illumina sequence names. So i thought if i could change the assembly name also similar to illumina sequences would help me to solve this problem. so basically want to change the name of the assembly file produced by racon+illumina polishing. What do you think?

rvaser commented 5 years ago

Not sure that this will help. Is the file np_cov.bam empty?

apoosakkannu commented 5 years ago

bam file is not empty and all the files have similar size like normal files. thats why i was thinking it could be possibly due to the name change.

rvaser commented 5 years ago

Can you run head -n 1 98ZLc_nanopore_assembly_racon_metka2.fasta and head -n 1 98ZLc_nanopore_assembly_racon_metka2_racon2.fa?