vladimirsouza / lrRNAseqVariantCalling

Codes for the Iso-Seq variant-calling paper
MIT License
10 stars 3 forks source link

Variant calling with Nanopore cDNA-seq? #4

Closed CWYuan08 closed 4 months ago

CWYuan08 commented 1 year ago

Hi I have Nanopore PCR cDNA-seq data from PromethION (very deeply sequenced), that I would like to study allelic expression. I want to use these reads to do variant calling first (since the lack of genomic data), could I use the same pipeline that used for Nanopore direct RNA-seq here?

Many thanks! Best, CW

vladimirsouza commented 1 year ago

Hi CWYuan08,

I think you can, but you will probably get low-quality calls. In our preprint, we show the low performance of variant calling from nanopore data for all tested pipelines, mainly for indel calls, even when read coverage is high (80–100 reads) — see supplementary figures Fig. S8A/B. This may be explained by the high error rate of nanopore sequences — see supplementary figure Fig. S8C.

Best regards.

CWYuan08 commented 1 year ago

Dear @vladimirsouza ,

many thanks for the suggestion. I think I will first study the SNPs, excluding the indels. For some samples we have >200 coverage, I will filter the high quality reads to do the analysis. Would it worth to use corrected Nanopore reads for variant calling? There are several tools that run local clustering to correct reads.

Best, CW

vladimirsouza commented 1 year ago

Hi @CWYuan08,

I think it makes sense to use raw sequences directly to variant calling pipelines rather than using an error correction step previously. Because distinguishing sequencing errors from true variants is exactly what variant callers should do. However, since I didn't include in our mini-benchmark (link) any pipeline that uses an error correction step, I can't say how good it would be.

If the templates of your samples were prepared with CyclomicsSeq, you have multiple copies of the same template. In this case, I would try to compute consensus sequences using Cyclomics_consensus_pipeline. However, I don't have any experience with this tool.

If you want to use one of the pipelines that we tested, I would recommend using SNCR+fC+DeepVariant (code), since you want to call only SNPs.

I hope this might help a bit. Sorry for not being able to give more feedback.

Best regards.