grenaud / schmutzi

Maximum a posteriori estimate of contamination for ancient samples
20 stars 2 forks source link

schmutzi.pl fails with and without prediction of the contaminant #18

Open arenvale opened 1 year ago

arenvale commented 1 year ago

Good afternoon Gabriel. I have been experiencing some problems obtaining the contamination estimates. I have mitogenomes from ancient human samples sequenced paired-end 2x75. I have two cases with questions: A) During bioinformatics processing I merge the reads and do alignment with BWA-MEM. An example damage pattern I get is the following (only in 3', I am trying to figure it out): LZ3-PMD_plot.frag.pdf I wanted to run the protocol detailed in "Quick start guide". For some samples the result of conDeam.pl is higher than 10%, so I wanted to run schmutzi.pl first without and then with the contaminant prediction. Randomly, for some samples it has come to do more or less iterations, but for most it has not generated the _final.cont.est files. I wanted to know if there is some error in my procedure, or if the fact of the distortion of the damage that I observe graphically can generate that the contamination cannot be calculated.

B) On the other hand, given the incomplete damage pattern I obtained, I tried mapping the readings with BWA+ALN+SAMPE without mergeing the reads. An example damage pattern I get is as follows: LZ3-PMD_plot.frag.pdf While this procedure would not be the most suitable for further analysis, I wanted to test the contamination calculation in this case that the pattern is correctly observed. However, unfortunately contDeam.pl throws me the following error:

Libgab.cpp: destringify() Unable to convert string="-nan"
system  cmd /home/schmutzi/src/contDeam  -deamread -deam5p sample.sorted.md.schmutzi.endo.5p.prof  -deam3p sample.sorted.md.schmutzi.endo.3p.prof  -log  sample.sorted.md.schmutzi.cont.deam   sample.sorted.md.bam failed: 256 at /home/schmutzi/src/contDeam.pl line 22.

I wanted to know if you have any idea what might be going on in both cases. Thank you very much for your help. Valeria

grenaud commented 1 year ago

A few things: 1) did you trim adapters/merge the paired-end data? 2) I recommend mapping with bwa aln with parameters: -n 0.01 -o 2 -l 16500. 3) did you map to the mt in isolation?

arenvale commented 1 year ago

Hi! Tanks for the quick response.

  1. Yes in the protocol A). I trimmed the adapters with cutadapt and merged paired-end data with AdapterRemoval.
  2. In protocol A) in which I merged the paired-end reads, I tried different types of alignment and I stayed with bwa mem because I was able to recover some deletions that the haplotypes I work with have that I was not able to recover with bwa aln. With the different alignments I tried (one of them was bwa aln with similar parameters) I obtained the same partial damage pattern (only in 3') and I had similar results when I ran schmutzi. In particular, when I ran it schmutzi.pl --notusepredC on the bwa aln alignment I got the following error: failed: 9 at /home/vale/schmutzi/src/schmutzi.pl line 372.
  3. Yes. I made mt capture and mapped only to mt genome.
grenaud commented 1 year ago

Ok do you have any signs of aDNA damage on the mitogenome?

arenvale commented 1 year ago

Yes, I can visualize the damage patterns using PMDtools (attached in the first comment).

grenaud commented 1 year ago

How many fragments did you end up mapping?

arenvale commented 1 year ago

I had 348209 mapped Q30 reads. Do you want me to send you the bam files? Thank you so much!

grenaud commented 1 year ago

yes please, via email.

jkimsis commented 3 months ago

I'm having the same issue with some files, while others from the same sequencing run are working fine. I aligned all of them with bwa aln -l 1024 -n 0.01 -o 2 using the mtDNA fasta included with schmutzi. Any idea what might be causing this?

grenaud commented 3 months ago

can you please open a new issue, post all the commands you have used and potentially send them to me via email?