isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
269 stars 48 forks source link

illumina data to polish #75

Open asdcid opened 6 years ago

asdcid commented 6 years ago

Hi, the manual said the latest version of Racon support illumina reads, but what is the short-read format Racon support? Should I put R1/R2 into one file? or use some specific parameter? Thanks, Raymond

rvaser commented 6 years ago

Hello Raymond, racon expects one file with any kind of reads (3rd gen or 2nd gen paired/single ends). Therefore, you should join your paired ends into one file but be careful that reads from a pair do not have the same name up to the first whitespace. If you need a helper script, please take a look at https://github.com/isovic/racon/issues/68#issuecomment-386223150.

Best regards, Robert

P.s. If you are polishing a large genome, please use the latest commit.

jdmontenegro commented 6 years ago

Hello guys, Which mapper would be the recommended one for mapping illumina reads to a raw miniasm assembly? Using bowtie2 with default parameters (I know, not the best idea, but just a first test) I am getting roughly 20% mapping efficiency and only 30% horizontal coverage, so not great.

Any recommendations would be greatly appreaciated.

Kind regards,

rvaser commented 6 years ago

Hello, you can try minimap2 with -x sr option. By raw miniasm assembly you used PacBio/ONT reads right? If so, I would advise polishing with those reads first (if you have decent coverage).

Best regards, Robert

jdmontenegro commented 6 years ago

Thank you rvaser, That is correct I have a raw assembly obtained from the Flye assembler using 30X coverage of PacBio reads. The raw assembly should have similar error rates as raw pacbio reads. I have done some initial polishing using the pacbio reads and now I think I can map the illumina reads. I read in Hen Li's minimap2 page that he does not recommend mapping short reads to unpolished pacbio assemblies, but I guess this initially polished assembly should be OK ?

Cheers,

rvaser commented 6 years ago

The initially polished assembly should be alright.

Best regards, Robert

xinwenzhg commented 6 years ago

Hi guys, I ran the last version of Racon with Illumina pair-end reads ( My contigs have already been polished by pilon before). After Racon, 8 breaks have been fixed surprisingly. I don't understand, does racon also stitch contigs? Isn't it a consensus tool? Thanks! Xinwen

rvaser commented 6 years ago

Hi Xinwen, racon does not stitch contigs together, it polishes each of them separately. By breaks you mean what exactly?

Best regards, Robert

xinwenzhg commented 6 years ago

Hi Robert, After racon, the number of fragments in my fasta file changed from 29 to 21, so I thought racon may fix some breaks by stitching fragments together. That's why I ask.

Then I checked the fragments length in fasta files before and after racon, and found my eight fragments (1000 -1500 bp ) are simply removed by racon instead of stitching to other fragments. Some of my other long contigs got 0-100 bp shorter. Is this the expected behavior of racon? Thank you!

Best regards, Xinwen

rvaser commented 6 years ago

Hi Xinwen, racon by default does not output unpolished sequences. You can disable that with the following option

-u, --include-unpolished
    output unpolished target sequences

You can determine which of the outputted sequences are unpolished by checking their headers, i.e. by checking tags RC (number of reads used for polishing) and XC (percentage of windows corrected).

It is quite normal that the length of polished sequences is different (shorter/longer) when compared to original length.

Best regards, Robert

xinwenzhg commented 6 years ago

Hi Robert, Thank you so much. I checked the headers, they're very helpful. Best regards, Xinwen