ksahlin / isONcorrect

Error correction of ONT transcript reads
GNU General Public License v3.0
58 stars 9 forks source link

Deletions #14

Closed parsboy66 closed 11 months ago

parsboy66 commented 2 years ago

Hello

thanks for this amazing work,

I used it on Nanopore data, especially my goal was cleaning the data inorder to find Mutations which already confirmed by WES.

it seems isoncorrect performance is as not good on deletions ( and homo regions) as the other Mutations.. is there any way to improve it?

Thanks again Iman

ksahlin commented 2 years ago

Hi @parsboy66,

It depends on the scenario. Could you elaborate a bit regarding this? Particularly:

  1. Is it ONT cDNA sequencing? Targeted or whole-genome?
  2. Do you mean that you still observe errors in homopolymer regions and deletions in your cDNA sequences?
  3. Do you have a rough estimate of the post-correction error rate? If so, how is that performed? (e.g., mapping to reference sequences?)

Best, Kristoffer

parsboy66 commented 2 years ago

Thank you for responding. yes actually our sample is cDNA sequenced by Nanopore based on the target enrichment panel that we have . (80 genes). we did whole exam sequencing on the same sample to confirm the AF of mutations which we are interested to work on.

about the deletion, for example for 1 gene that we have roughly 85% AF (deletion) in our Nanopore reads without correcting them, which also confirmed by WES, after correcting reads AF reach to 98%. simply we loos most of the WT reads.

about the homopolymeric regions, AF after and before correction is almost same.

I have to mention it, in all conditions the overall error rate dropped , also by looking at consensus of reads in IGV and checking BAM stat , clearly can be seen that data is much clean and neat.

ksahlin commented 2 years ago

To "overcorrect" fewer 'true' deletions and mutations, you can specify parameter --T to something lower than the default value of 0.1. For example, you could try 0.05 or lower if you need. However, this comes at a cost of not correcting as much errors - it is a tradeoff.

If it is fast to run isONcorrect on your data, I would also set the parameters --k 9 --w 10 --max_seqs 1000 which gives slightly higher accuracy than default parameters. It may be 2-3 times slower than the default version and takes more memory.

ksahlin commented 11 months ago

Closing this. I may also mention that we have added an additional overcorrection check from version 0.1.0 and up, which should further reduce eventual over-corrections.