isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
257 stars 48 forks source link

How to generate OVERLAP file? #234

Closed cement-head closed 1 year ago

cement-head commented 1 year ago

This seems like a dumb question, but I have no idea how to generate an overlap file.

I have PacBio CLR data, and I have Illumina PE data. I would like to use the illumina data to polish the pacbio data.

How do I generate the "overlap" file? If I use MHAP, I get a output file in <*.dat> format. How do I convert this into an MHAP format?

TIA

rvaser commented 1 year ago

Hello, you can use minimap2 to map Illumina data to PacBio data with minimap2 -ax sr clr.fastq ill.fastq > out.sam. If you by CLR data mean reads, I am not sure that all will be polished equally as racon will treat them as contigs and take the best overlap per Illumina read. If you have paired end data, you need to ensure that they are in one file with distinct names up to the first white space (you can use scripts/racon_preprocess.py to achieve that before overlapping).

Best regards, Robert