shishenyxx / DeepMosaic

DeepMosaic is a deep-learning-based mosaic single nucleotide classification tool without the need of matched control information.
https://www.nature.com/articles/s41587-022-01559-w
Other
41 stars 5 forks source link

Effect of adapters on the final calling #34

Open ChenAugustin opened 5 months ago

ChenAugustin commented 5 months ago

Hello Xiaoxu Yang,

I ran DeepMosaic on WES data preprocessed as recommended in the paper (duplicate marked and base recalibrated with GATK 4.0.4, indel realigned with GATK 3.8.1) but forgot to remove adapters before mapping. My dataset is quite large so before re-running all the analyses with trimmed reads, I wanted to ask: 1) Could you confirm that you've been removing adapters before mapping when benchmarking DeepMosaic? 2) Have you had the chance to assess the effect of adapter trimming on the final calling?

Thank you in advance!

shishenyxx commented 5 months ago

Hello Xiaoxu Yang,

I ran DeepMosaic on WES data preprocessed as recommended in the paper (duplicate marked and base recalibrated with GATK 4.0.4, indel realigned with GATK 3.8.1) but forgot to remove adapters before mapping. My dataset is quite large so before re-running all the analyses with trimmed reads, I wanted to ask:

1. Could you confirm that you've been removing adapters before mapping when benchmarking DeepMosaic?

2. Have you had the chance to assess the effect of adapter trimming on the final calling?

Thank you in advance!

Thank you for your question, ChenAugustin,

  1. I confirm that we used adapter-trimmed bams for our analysis, especially for our exomes.
  2. Unfortunately I haven't, but I suggest running on a small sample size and comparing against the first run, the adaptor sequence shouldn't map to any genomic position but it might if there are sequencing errors or other issues. You can also go back to your calls to see whether most are at the edges of reads.

Best,

Xiaoxu

ChenAugustin commented 3 months ago

Hello Xiaoxu Yang, I ran DeepMosaic on WES data preprocessed as recommended in the paper (duplicate marked and base recalibrated with GATK 4.0.4, indel realigned with GATK 3.8.1) but forgot to remove adapters before mapping. My dataset is quite large so before re-running all the analyses with trimmed reads, I wanted to ask:

1. Could you confirm that you've been removing adapters before mapping when benchmarking DeepMosaic?

2. Have you had the chance to assess the effect of adapter trimming on the final calling?

Thank you in advance!

Thank you for your question, ChenAugustin,

  1. I confirm that we used adapter-trimmed bams for our analysis, especially for our exomes.
  2. Unfortunately I haven't, but I suggest running on a small sample size and comparing against the first run, the adaptor sequence shouldn't map to any genomic position but it might if there are sequencing errors or other issues. You can also go back to your calls to see whether most are at the edges of reads.

Best,

Xiaoxu

MPOS_distrib-mosaic

Hi Xiaoxu Wang,

Thank you for your reply and suggestions! Please find above a histogram depicting the number of variants predicted as "mosaic" by DeepMosaic at given values of median read position (MPOS, from Mutect2 output). The number of variants with MPOS < 5 is relatively low, suggesting not removing adapters doesn't have major consequences on the variants called. But I might run adapter trimming in parallel, just to be sure, and then, as you suggested, compare variants called on a small cohort with vs without adapter trimming.

I was more surprised by the peak of variants at MPOS = 20. Did you observe this in your analyses before?

Best, Augustin

shishenyxx commented 3 months ago

Hello Xiaoxu Yang, I ran DeepMosaic on WES data preprocessed as recommended in the paper (duplicate marked and base recalibrated with GATK 4.0.4, indel realigned with GATK 3.8.1) but forgot to remove adapters before mapping. My dataset is quite large so before re-running all the analyses with trimmed reads, I wanted to ask:

1. Could you confirm that you've been removing adapters before mapping when benchmarking DeepMosaic?

2. Have you had the chance to assess the effect of adapter trimming on the final calling?

Thank you in advance!

Thank you for your question, ChenAugustin,

  1. I confirm that we used adapter-trimmed bams for our analysis, especially for our exomes.
  2. Unfortunately I haven't, but I suggest running on a small sample size and comparing against the first run, the adaptor sequence shouldn't map to any genomic position but it might if there are sequencing errors or other issues. You can also go back to your calls to see whether most are at the edges of reads.

Best, Xiaoxu

MPOS_distrib-mosaic

Hi Xiaoxu Wang,

Thank you for your reply and suggestions! Please find above a histogram depicting the number of variants predicted as "mosaic" by DeepMosaic at given values of median read position (MPOS, from Mutect2 output). The number of variants with MPOS < 5 is relatively low, suggesting not removing adapters doesn't have major consequences on the variants called. But I might run adapter trimming in parallel, just to be sure, and then, as you suggested, compare variants called on a small cohort with vs without adapter trimming.

I was more surprised by the peak of variants at MPOS = 20. Did you observe this in your analyses before?

Best, Augustin

Hi Augustin,

Thank you for your response. I don't think I've seen anything like this and I agree that the 20 bp seems weird to me. Did you check the DeepMosaic plots whether you have truncations or mapping issues there? Are you looking at exome capture data? Would it be possible for the probes to be at this length?

Best,

Xiaoxu