grenaud / schmutzi

Maximum a posteriori estimate of contamination for ancient samples
20 stars 2 forks source link

Problem running the Test Data #12

Closed arenvale closed 3 years ago

arenvale commented 3 years ago

Hi Gabriel! This is the first time I am trying to use the program. Trying to do the tests with the Test Data from the terminal in Ubuntu with: contDeam.pl --library double --out outputdir/sim testdata/simulation.bam I get the following error:

running cmd /home/vale/schmutzi/src/bam2prof.exe -length  2 -endo -double -5p outputdir/sim.endo.5p.prof  -3p outputdir/sim.endo.3p.prof testdata/simulation.bam
bash: /home/vale/schmutzi/src/bam2prof.exe: File or directory does not exist
system  cmd /home/vale/schmutzi/src/bam2prof.exe -length  2 -endo -double -5p outputdir/sim.endo.5p.prof  -3p outputdir/sim.endo.3p.prof testdata/simulation.bam failed: 32512 at ./contDeam.pl line 22.

I don't know what I'm doing wrong. If you could help me I would appreciate it. Regards, Valeria

grenaud commented 3 years ago

Hi Valeria, Is this under windows? like cygwin?

Gabriel

arenvale commented 3 years ago

No, Ubuntu 20.04

grenaud commented 3 years ago

pure ubuntu or mounted on windows?

On Tue, Jan 26, 2021 at 4:54 PM arenvale notifications@github.com wrote:

No, Ubuntu 20.04

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

arenvale commented 3 years ago

Pure Ubuntu, only Ubuntu 20.04 in my computer :)

grenaud commented 3 years ago

Ok this is super weird because it usually does this if it detects cygwin, can you give me the output of "uname -a"

On Tue, Jan 26, 2021 at 5:08 PM arenvale notifications@github.com wrote:

Pure Ubuntu, only Ubuntu 20.04 in my computer :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grenaud/schmutzi/issues/12#issuecomment-767648595, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQRNIYJIYNCCI3LAQOY7B3S33SGXANCNFSM4WTVBQZA .

arenvale commented 3 years ago

Linux aren 5.8.0-38-generic #43~20.04.1-Ubuntu SMP Tue Jan 12 16:39:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

grenaud commented 3 years ago

Ok this is very strange. Could you do a make clean find

and send me the output of "find" please?

On Tue, Jan 26, 2021 at 5:23 PM arenvale notifications@github.com wrote:

Linux aren 5.8.0-38-generic #43~20.04.1-Ubuntu SMP Tue Jan 12 16:39:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

arenvale commented 3 years ago

~/schmutzi/src$ find . ./testdata ./testdata/simulation.bam.bai ./testdata/.data ./testdata/mezB9687.bam.bai ./testdata/mezB9687.bam ./testdata/simulation.bam ./log2ConsensusLog.cpp ./posteriorDeam.R ./contOut2ContEstSource.pl ./wrapper.pl ./miscfunc.h ./miscfunc.cpp ./endoCaller.cpp ./bam2makeSchmutzi.pl ./approxDist.R ./logs2pos.cpp ./jointFreqDeaminated.R ./freq2consfreq.pl ./subSampleBAM.cpp ./mitoConsPDF.R ./logdiff.cpp ./bam2makeSchmutziHaplogrep2.pl ./log2fasta.cpp ./addXACircular.cpp ./deamProf2pdf.R ./parseSchmutzi.pl ./msa2log.cpp ./log2freq.cpp ./insertSize.cpp ./jointFreqDeaminated.cpp ./damage2profile.cpp ./fasta2haplogrep.py ./contDeam.pl ./countRecords.cpp ./outputdir ./Makefile ./msa2freq.cpp ./wrapperRMDUP.pl ./msa2singlefreq.cpp ./splitEndoVsCont ./splitEndoVsCont/denisovaHuman.pos ./splitEndoVsCont/poshap2splitbam.cpp ./splitEndoVsCont/Makefile ./splitEndoVsCont/neandertalHuman.pos ./schmutzi.pl ./mtCont.cpp ./filterlog.cpp ./bam2prof.cpp ./jointFreqDeaminatedDouble.cpp ./contDeam.cpp ./logmask.cpp ./contOut2ContEst.pl

grenaud commented 3 years ago

it looks fine, can you do: make find and paste the output again?

On Tue, Jan 26, 2021 at 7:52 PM arenvale notifications@github.com wrote:

find . ./testdata ./testdata/simulation.bam.bai ./testdata/.data ./testdata/mezB9687.bam.bai ./testdata/mezB9687.bam ./testdata/simulation.bam ./log2ConsensusLog.cpp ./posteriorDeam.R ./contOut2ContEstSource.pl ./wrapper.pl ./miscfunc.h ./miscfunc.cpp ./endoCaller.cpp ./bam2makeSchmutzi.pl ./approxDist.R ./logs2pos.cpp ./jointFreqDeaminated.R ./freq2consfreq.pl ./subSampleBAM.cpp ./mitoConsPDF.R ./logdiff.cpp ./bam2makeSchmutziHaplogrep2.pl ./log2fasta.cpp ./addXACircular.cpp ./deamProf2pdf.R ./parseSchmutzi.pl ./msa2log.cpp ./log2freq.cpp ./insertSize.cpp ./jointFreqDeaminated.cpp ./damage2profile.cpp ./fasta2haplogrep.py ./contDeam.pl ./countRecords.cpp ./outputdir ./Makefile ./msa2freq.cpp ./wrapperRMDUP.pl ./msa2singlefreq.cpp ./splitEndoVsCont ./splitEndoVsCont/denisovaHuman.pos ./splitEndoVsCont/poshap2splitbam.cpp ./splitEndoVsCont/Makefile ./splitEndoVsCont/neandertalHuman.pos ./schmutzi.pl ./mtCont.cpp ./filterlog.cpp ./bam2prof.cpp ./jointFreqDeaminatedDouble.cpp ./contDeam.cpp ./logmask.cpp ./contOut2ContEst.pl

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

arenvale commented 3 years ago

~/schmutzi/src$ find . ./testdata ./testdata/simulation.bam.bai ./testdata/.data ./testdata/mezB9687.bam.bai ./testdata/mezB9687.bam ./testdata/simulation.bam ./log2ConsensusLog.cpp ./posteriorDeam.R ./countRecords.o ./bam2prof ./jointFreqDeaminatedDouble ./miscfunc.o ./contOut2ContEstSource.pl ./msa2singlefreq ./endoCaller.o ./log2freq.o ./wrapper.pl ./contDeam ./miscfunc.h ./miscfunc.cpp ./msa2freq.o ./endoCaller.cpp ./jointFreqDeaminated ./bam2makeSchmutzi.pl ./approxDist.R ./log2ConsensusLog ./logs2pos.cpp ./jointFreqDeaminated.R ./damage2profile.o ./freq2consfreq.pl ./subSampleBAM.cpp ./mitoConsPDF.R ./logdiff.cpp ./bam2makeSchmutziHaplogrep2.pl ./log2fasta.cpp ./addXACircular.cpp ./deamProf2pdf.R ./msa2singlefreq.o ./parseSchmutzi.pl ./jointFreqDeaminatedDouble.o ./msa2log.cpp ./log2freq.cpp ./insertSize.cpp ./filterlog.o ./jointFreqDeaminated.cpp ./addXACircular.o ./damage2profile.cpp ./fasta2haplogrep.py ./contDeam.pl ./log2fasta.o ./subSampleBAM ./damage2profile ./countRecords.cpp ./insertSize ./filterlog ./msa2log ./log2ConsensusLog.o ./endoCaller ./outputdir ./mtCont.o ./subSampleBAM.o ./msa2freq ./Makefile ./msa2freq.cpp ./wrapperRMDUP.pl ./contDeam.o ./log2freq ./msa2singlefreq.cpp ./logmask ./addXACircular ./splitEndoVsCont ./splitEndoVsCont/denisovaHuman.pos ./splitEndoVsCont/poshap2splitbam.cpp ./splitEndoVsCont/Makefile ./splitEndoVsCont/poshap2splitbam ./splitEndoVsCont/neandertalHuman.pos ./splitEndoVsCont/poshap2splitbam.o ./logs2pos.o ./schmutzi.pl ./log2fasta ./logdiff ./msa2log.o ./insertSize.o ./bam2prof.o ./mtCont ./logs2pos ./mtCont.cpp ./countRecords ./filterlog.cpp ./bam2prof.cpp ./jointFreqDeaminatedDouble.cpp ./contDeam.cpp ./logmask.cpp ./contOut2ContEst.pl ./logdiff.o ./logmask.o ./jointFreqDeaminated.o

arenvale commented 3 years ago

With those commands I was able to run the test data! Thank you so much! Now I'm trying to run it with my own data, and I get the following: ~/schmutzi/src$ ./contDeam.pl --lengthDeam 5 --library double --out aln_059.sorted_by_coord.mkdup.rescaled.MAPPED rCRS.fasta aln_059.sorted_by_coord.mkdup.rescaled.MAPPED.bam running cmd /home/vale/schmutzi/src/bam2prof -length 5 -endo -double -5p aln_059.sorted_by_coord.mkdup.rescaled.MAPPED.endo.5p.prof -3p aln_059.sorted_by_coord.mkdup.rescaled.MAPPED.endo.3p.prof aln_059.sorted_by_coord.mkdup.rescaled.MAPPED.bam Could not open input BAM filealn_059.sorted_by_coord.mkdup.rescaled.MAPPED.bam system cmd /home/vale/schmutzi/src/bam2prof -length 5 -endo -double -5p aln_059.sorted_by_coord.mkdup.rescaled.MAPPED.endo.5p.prof -3p aln_059.sorted_by_coord.mkdup.rescaled.MAPPED.endo.3p.prof aln_059.sorted_by_coord.mkdup.rescaled.MAPPED.bam failed: 256 at ./contDeam.pl line 22.

grenaud commented 3 years ago

sounds good! It is possible you have very little data or little damage?

arenvale commented 3 years ago

Both are possible. I used USER enzyme and my I had ~5000 mapped reads for human mitochondria. MapDamage showed some damage patterns but not a lot, that's why I wanted to make sure by estimating the amount of contamination. Am I running the right command?

grenaud commented 3 years ago

what coverage are you getting? Also, please note that with USER, schmutzi cannot really lock on the endogenous signal in case there is contamination.

On Tue, Jan 26, 2021 at 9:10 PM arenvale notifications@github.com wrote:

Both are possible. I used USER enzyme and my I had ~5000 mapped reads for human mitochondria. MapDamage showed some damage patterns but not a lot, that's why I wanted to make sure by estimating the amount of contamination. Am I running the right command?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

arenvale commented 3 years ago

I am getting 99% coverage and 12x depth. It's exploratory work and the consensus mitogenome I got is consistent with other preliminary results, but I wanted to have a contamination estimate to try to publish it

grenaud commented 3 years ago

Ok the coverage is low-ish but the absence of damage is actually a problem because we cannot really leverage it to infer the endogenous base. I suggest you do:

endoCaller -cont 0.3 -single -seq [output prefix].fa -log [output prefix].log [mt reference] [input bam file] The -single flag means that we think there is a single contaminant and we think there is a prior on the contamination rate of 30%.

Then call:

mtCont [output prefix].log [mt reference] [input bam file] [path to contamination freq files]/*freq

This is not perfect but should give you an idea. Also, do not forget to map to your mitochondrial in isolation.

arenvale commented 3 years ago

Ok, perfect! I ran those codes but I don't know how to interpret the result of mtCont: ../share/schmutzi/alleleFreqMT/197/freqs/gi_94449824_gb_AY950293.2_1.freq 0.97 -2177.5 ../share/schmutzi/alleleFreqMT/197/freqs/gi_94449824_gb_AY950293.2_1.freq 0.98 -2319.6 ../share/schmutzi/alleleFreqMT/197/freqs/gi_94449824_gb_AY950293.2_1.freq 0.99 -2560.61 And what do you mean by "map to your mitochondrial in isolation"? Thank you so much!

grenaud commented 3 years ago

I would use the Eurasian database if you are pretty sure that the contaminant is European meaning do not map to the genome and extract the mito. This will cause dips in coverage due to numts

On Wed, Jan 27, 2021 at 3:10 PM arenvale notifications@github.com wrote:

Ok, perfect! I ran those codes but I don't know how to interpret the result of mtCont: ../share/schmutzi/alleleFreqMT/197/freqs/gi_94449824_gb_AY950293.2_1.freq 0.97 -2177.5 ../share/schmutzi/alleleFreqMT/197/freqs/gi_94449824_gb_AY950293.2_1.freq 0.98 -2319.6 ../share/schmutzi/alleleFreqMT/197/freqs/gi_94449824_gb_AY950293.2_1.freq 0.99 -2560.61 And what do you mean by "map to your mitochondrial in isolation"? Thank you so much!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grenaud/schmutzi/issues/12#issuecomment-768310841, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQRNI3G7ONQTJF24NC6M5TS4ANGLANCNFSM4WTVBQZA .

arenvale commented 3 years ago

Oh nono I mapped only to the mitochondria. Ok, I did it with the Eurasian now. All log.posterior are lower than -900 (-1000, -2000)

grenaud commented 3 years ago

and what is the estimate provided by mtCont? maybe use contOut2ContEstSource.pl on the file?

On Wed, Jan 27, 2021 at 3:46 PM arenvale notifications@github.com wrote:

Oh nono I mapped only to the mitochondria. Ok, I did it with the Eurasian now. All log.posterior are lower than -900 (-1000, -2000)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

arenvale commented 3 years ago

I don't know where is the information of the estimate provided by mtCont. Running contOut2ContEstSource.pl I got: P4b1.freq 0.03 0.02 0.04

grenaud commented 3 years ago

ok so P4b1 is your most likely source and you have 3% contamination. Again, I would not bet on this estimate, 12X and no damage is a very difficult target.

cheers, Gabriel

On Wed, Jan 27, 2021 at 4:00 PM arenvale notifications@github.com wrote:

I don't know where is the information of the estimate provided by mtCont. Running contOut2ContEstSource.pl I got: P4b1.freq 0.03 0.02 0.04

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grenaud/schmutzi/issues/12#issuecomment-768344173, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQRNIZ2ZEQRHYFINLRYR5DS4ATCDANCNFSM4WTVBQZA .

arenvale commented 3 years ago

Ok, I will evaluate whether or not to include it in the publication. So, you don't recommend the use of the USER enzyme for the next analysis?

grenaud commented 3 years ago

well not to check authenticity. What some colleagues have done is to do a small lib without USER to convince themselves that it is aDNA and the rest with USER.

On Wed, Jan 27, 2021 at 4:21 PM arenvale notifications@github.com wrote:

Ok, I will evaluate whether or not to include it in the publication. So, you don't recommend the use of the USER enzyme for the next analysis?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grenaud/schmutzi/issues/12#issuecomment-768358920, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQRNIYOASRHCGCAXU5UCRDS4AVQ5ANCNFSM4WTVBQZA .

arenvale commented 3 years ago

I understand perfectly. Well thank you very much for your help! You were very kind to answer all my questions. Best regards. Valeria

grenaud commented 3 years ago

We aim to please :-) Could you close the issue?

On Wed, Jan 27, 2021 at 4:45 PM arenvale notifications@github.com wrote:

I understand perfectly. Well thank you very much for your help! You were very kind to answer all my questions. Best regards. Valeria

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.