lbcb-sci / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads
MIT License
201 stars 34 forks source link

Use cDNA to polish exons #73

Closed SalvadorGJ closed 1 year ago

SalvadorGJ commented 1 year ago

Hi!

I have long cDNA reads which I want to use as input for genome polishing, expecting that they will only fix exons across the draft reference. Have you tested racon with cDNA alignments before?

I was wondering about how racon deals with "N" operators (CIGAR) of SAM/PAF alignments, as these are suggested to represent introns on mRNA-to-genome alignment. With this in knowledge, do you think it is capable to differentiate between real introns (which are not intended to be removed) and insertions on draft's exons (which should be polished)?

rvaser commented 1 year ago

Hello Salvador, unfortunately, I do not recall evaluating with RNA sequences during development. The N operations will be treated as deletions.

Best regards, Robert

SalvadorGJ commented 1 year ago

Hi Robert,

Thanks for your answer. Indeed I was expecting that. Meanwhile I found a solution using SplitNCigarReads from GATK suite. This tool divides the splicing cDNA BAM alignments into "only exonic part" alignment, by getting rid of introns. Then I used Racon using the parameters -u (Output unpolished target sequences) and --no-trimming (Disables consensus trimming at window ends). I tried once and seemed to work fine, but I didn't have the time to evaluate it exhaustively.

I leave my comment hoping that it can be of help to someone in the future, as well as to receive your feedback if you think that the use of the parameters is appropriate.

Best, Salvador

rvaser commented 1 year ago

I think the command line options you are using make sense.

Best regards, Robert