Closed arenvale closed 3 years ago
Hi Valeria, Is this under windows? like cygwin?
Gabriel
No, Ubuntu 20.04
pure ubuntu or mounted on windows?
On Tue, Jan 26, 2021 at 4:54 PM arenvale notifications@github.com wrote:
No, Ubuntu 20.04
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
Pure Ubuntu, only Ubuntu 20.04 in my computer :)
Ok this is super weird because it usually does this if it detects cygwin, can you give me the output of "uname -a"
On Tue, Jan 26, 2021 at 5:08 PM arenvale notifications@github.com wrote:
Pure Ubuntu, only Ubuntu 20.04 in my computer :)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grenaud/schmutzi/issues/12#issuecomment-767648595, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQRNIYJIYNCCI3LAQOY7B3S33SGXANCNFSM4WTVBQZA .
Linux aren 5.8.0-38-generic #43~20.04.1-Ubuntu SMP Tue Jan 12 16:39:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Ok this is very strange. Could you do a make clean find
and send me the output of "find" please?
On Tue, Jan 26, 2021 at 5:23 PM arenvale notifications@github.com wrote:
Linux aren 5.8.0-38-generic #43~20.04.1-Ubuntu SMP Tue Jan 12 16:39:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
~/schmutzi/src$ find . ./testdata ./testdata/simulation.bam.bai ./testdata/.data ./testdata/mezB9687.bam.bai ./testdata/mezB9687.bam ./testdata/simulation.bam ./log2ConsensusLog.cpp ./posteriorDeam.R ./contOut2ContEstSource.pl ./wrapper.pl ./miscfunc.h ./miscfunc.cpp ./endoCaller.cpp ./bam2makeSchmutzi.pl ./approxDist.R ./logs2pos.cpp ./jointFreqDeaminated.R ./freq2consfreq.pl ./subSampleBAM.cpp ./mitoConsPDF.R ./logdiff.cpp ./bam2makeSchmutziHaplogrep2.pl ./log2fasta.cpp ./addXACircular.cpp ./deamProf2pdf.R ./parseSchmutzi.pl ./msa2log.cpp ./log2freq.cpp ./insertSize.cpp ./jointFreqDeaminated.cpp ./damage2profile.cpp ./fasta2haplogrep.py ./contDeam.pl ./countRecords.cpp ./outputdir ./Makefile ./msa2freq.cpp ./wrapperRMDUP.pl ./msa2singlefreq.cpp ./splitEndoVsCont ./splitEndoVsCont/denisovaHuman.pos ./splitEndoVsCont/poshap2splitbam.cpp ./splitEndoVsCont/Makefile ./splitEndoVsCont/neandertalHuman.pos ./schmutzi.pl ./mtCont.cpp ./filterlog.cpp ./bam2prof.cpp ./jointFreqDeaminatedDouble.cpp ./contDeam.cpp ./logmask.cpp ./contOut2ContEst.pl
it looks fine, can you do: make find and paste the output again?
On Tue, Jan 26, 2021 at 7:52 PM arenvale notifications@github.com wrote:
find . ./testdata ./testdata/simulation.bam.bai ./testdata/.data ./testdata/mezB9687.bam.bai ./testdata/mezB9687.bam ./testdata/simulation.bam ./log2ConsensusLog.cpp ./posteriorDeam.R ./contOut2ContEstSource.pl ./wrapper.pl ./miscfunc.h ./miscfunc.cpp ./endoCaller.cpp ./bam2makeSchmutzi.pl ./approxDist.R ./logs2pos.cpp ./jointFreqDeaminated.R ./freq2consfreq.pl ./subSampleBAM.cpp ./mitoConsPDF.R ./logdiff.cpp ./bam2makeSchmutziHaplogrep2.pl ./log2fasta.cpp ./addXACircular.cpp ./deamProf2pdf.R ./parseSchmutzi.pl ./msa2log.cpp ./log2freq.cpp ./insertSize.cpp ./jointFreqDeaminated.cpp ./damage2profile.cpp ./fasta2haplogrep.py ./contDeam.pl ./countRecords.cpp ./outputdir ./Makefile ./msa2freq.cpp ./wrapperRMDUP.pl ./msa2singlefreq.cpp ./splitEndoVsCont ./splitEndoVsCont/denisovaHuman.pos ./splitEndoVsCont/poshap2splitbam.cpp ./splitEndoVsCont/Makefile ./splitEndoVsCont/neandertalHuman.pos ./schmutzi.pl ./mtCont.cpp ./filterlog.cpp ./bam2prof.cpp ./jointFreqDeaminatedDouble.cpp ./contDeam.cpp ./logmask.cpp ./contOut2ContEst.pl
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
~/schmutzi/src$ find . ./testdata ./testdata/simulation.bam.bai ./testdata/.data ./testdata/mezB9687.bam.bai ./testdata/mezB9687.bam ./testdata/simulation.bam ./log2ConsensusLog.cpp ./posteriorDeam.R ./countRecords.o ./bam2prof ./jointFreqDeaminatedDouble ./miscfunc.o ./contOut2ContEstSource.pl ./msa2singlefreq ./endoCaller.o ./log2freq.o ./wrapper.pl ./contDeam ./miscfunc.h ./miscfunc.cpp ./msa2freq.o ./endoCaller.cpp ./jointFreqDeaminated ./bam2makeSchmutzi.pl ./approxDist.R ./log2ConsensusLog ./logs2pos.cpp ./jointFreqDeaminated.R ./damage2profile.o ./freq2consfreq.pl ./subSampleBAM.cpp ./mitoConsPDF.R ./logdiff.cpp ./bam2makeSchmutziHaplogrep2.pl ./log2fasta.cpp ./addXACircular.cpp ./deamProf2pdf.R ./msa2singlefreq.o ./parseSchmutzi.pl ./jointFreqDeaminatedDouble.o ./msa2log.cpp ./log2freq.cpp ./insertSize.cpp ./filterlog.o ./jointFreqDeaminated.cpp ./addXACircular.o ./damage2profile.cpp ./fasta2haplogrep.py ./contDeam.pl ./log2fasta.o ./subSampleBAM ./damage2profile ./countRecords.cpp ./insertSize ./filterlog ./msa2log ./log2ConsensusLog.o ./endoCaller ./outputdir ./mtCont.o ./subSampleBAM.o ./msa2freq ./Makefile ./msa2freq.cpp ./wrapperRMDUP.pl ./contDeam.o ./log2freq ./msa2singlefreq.cpp ./logmask ./addXACircular ./splitEndoVsCont ./splitEndoVsCont/denisovaHuman.pos ./splitEndoVsCont/poshap2splitbam.cpp ./splitEndoVsCont/Makefile ./splitEndoVsCont/poshap2splitbam ./splitEndoVsCont/neandertalHuman.pos ./splitEndoVsCont/poshap2splitbam.o ./logs2pos.o ./schmutzi.pl ./log2fasta ./logdiff ./msa2log.o ./insertSize.o ./bam2prof.o ./mtCont ./logs2pos ./mtCont.cpp ./countRecords ./filterlog.cpp ./bam2prof.cpp ./jointFreqDeaminatedDouble.cpp ./contDeam.cpp ./logmask.cpp ./contOut2ContEst.pl ./logdiff.o ./logmask.o ./jointFreqDeaminated.o
With those commands I was able to run the test data! Thank you so much!
Now I'm trying to run it with my own data, and I get the following:
~/schmutzi/src$ ./contDeam.pl --lengthDeam 5 --library double --out aln_059.sorted_by_coord.mkdup.rescaled.MAPPED rCRS.fasta aln_059.sorted_by_coord.mkdup.rescaled.MAPPED.bam running cmd /home/vale/schmutzi/src/bam2prof -length 5 -endo -double -5p aln_059.sorted_by_coord.mkdup.rescaled.MAPPED.endo.5p.prof -3p aln_059.sorted_by_coord.mkdup.rescaled.MAPPED.endo.3p.prof aln_059.sorted_by_coord.mkdup.rescaled.MAPPED.bam Could not open input BAM filealn_059.sorted_by_coord.mkdup.rescaled.MAPPED.bam system cmd /home/vale/schmutzi/src/bam2prof -length 5 -endo -double -5p aln_059.sorted_by_coord.mkdup.rescaled.MAPPED.endo.5p.prof -3p aln_059.sorted_by_coord.mkdup.rescaled.MAPPED.endo.3p.prof aln_059.sorted_by_coord.mkdup.rescaled.MAPPED.bam failed: 256 at ./contDeam.pl line 22.
sounds good! It is possible you have very little data or little damage?
Both are possible. I used USER enzyme and my I had ~5000 mapped reads for human mitochondria. MapDamage showed some damage patterns but not a lot, that's why I wanted to make sure by estimating the amount of contamination. Am I running the right command?
what coverage are you getting? Also, please note that with USER, schmutzi cannot really lock on the endogenous signal in case there is contamination.
On Tue, Jan 26, 2021 at 9:10 PM arenvale notifications@github.com wrote:
Both are possible. I used USER enzyme and my I had ~5000 mapped reads for human mitochondria. MapDamage showed some damage patterns but not a lot, that's why I wanted to make sure by estimating the amount of contamination. Am I running the right command?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
I am getting 99% coverage and 12x depth. It's exploratory work and the consensus mitogenome I got is consistent with other preliminary results, but I wanted to have a contamination estimate to try to publish it
Ok the coverage is low-ish but the absence of damage is actually a problem because we cannot really leverage it to infer the endogenous base. I suggest you do:
endoCaller -cont 0.3 -single -seq [output prefix].fa -log [output prefix].log [mt reference] [input bam file]
The -single flag means that we think there is a single contaminant and we think there is a prior on the contamination rate of 30%.
Then call:
mtCont [output prefix].log [mt reference] [input bam file] [path to contamination freq files]/*freq
This is not perfect but should give you an idea. Also, do not forget to map to your mitochondrial in isolation.
Ok, perfect!
I ran those codes but I don't know how to interpret the result of mtCont:
../share/schmutzi/alleleFreqMT/197/freqs/gi_94449824_gb_AY950293.2_1.freq 0.97 -2177.5 ../share/schmutzi/alleleFreqMT/197/freqs/gi_94449824_gb_AY950293.2_1.freq 0.98 -2319.6 ../share/schmutzi/alleleFreqMT/197/freqs/gi_94449824_gb_AY950293.2_1.freq 0.99 -2560.61
And what do you mean by "map to your mitochondrial in isolation"?
Thank you so much!
I would use the Eurasian database if you are pretty sure that the contaminant is European meaning do not map to the genome and extract the mito. This will cause dips in coverage due to numts
On Wed, Jan 27, 2021 at 3:10 PM arenvale notifications@github.com wrote:
Ok, perfect! I ran those codes but I don't know how to interpret the result of mtCont: ../share/schmutzi/alleleFreqMT/197/freqs/gi_94449824_gb_AY950293.2_1.freq 0.97 -2177.5 ../share/schmutzi/alleleFreqMT/197/freqs/gi_94449824_gb_AY950293.2_1.freq 0.98 -2319.6 ../share/schmutzi/alleleFreqMT/197/freqs/gi_94449824_gb_AY950293.2_1.freq 0.99 -2560.61 And what do you mean by "map to your mitochondrial in isolation"? Thank you so much!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grenaud/schmutzi/issues/12#issuecomment-768310841, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQRNI3G7ONQTJF24NC6M5TS4ANGLANCNFSM4WTVBQZA .
Oh nono I mapped only to the mitochondria. Ok, I did it with the Eurasian now. All log.posterior are lower than -900 (-1000, -2000)
and what is the estimate provided by mtCont? maybe use contOut2ContEstSource.pl on the file?
On Wed, Jan 27, 2021 at 3:46 PM arenvale notifications@github.com wrote:
Oh nono I mapped only to the mitochondria. Ok, I did it with the Eurasian now. All log.posterior are lower than -900 (-1000, -2000)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
I don't know where is the information of the estimate provided by mtCont.
Running contOut2ContEstSource.pl I got:
P4b1.freq 0.03 0.02 0.04
ok so P4b1 is your most likely source and you have 3% contamination. Again, I would not bet on this estimate, 12X and no damage is a very difficult target.
cheers, Gabriel
On Wed, Jan 27, 2021 at 4:00 PM arenvale notifications@github.com wrote:
I don't know where is the information of the estimate provided by mtCont. Running contOut2ContEstSource.pl I got: P4b1.freq 0.03 0.02 0.04
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grenaud/schmutzi/issues/12#issuecomment-768344173, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQRNIZ2ZEQRHYFINLRYR5DS4ATCDANCNFSM4WTVBQZA .
Ok, I will evaluate whether or not to include it in the publication. So, you don't recommend the use of the USER enzyme for the next analysis?
well not to check authenticity. What some colleagues have done is to do a small lib without USER to convince themselves that it is aDNA and the rest with USER.
On Wed, Jan 27, 2021 at 4:21 PM arenvale notifications@github.com wrote:
Ok, I will evaluate whether or not to include it in the publication. So, you don't recommend the use of the USER enzyme for the next analysis?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grenaud/schmutzi/issues/12#issuecomment-768358920, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQRNIYOASRHCGCAXU5UCRDS4AVQ5ANCNFSM4WTVBQZA .
I understand perfectly. Well thank you very much for your help! You were very kind to answer all my questions. Best regards. Valeria
We aim to please :-) Could you close the issue?
On Wed, Jan 27, 2021 at 4:45 PM arenvale notifications@github.com wrote:
I understand perfectly. Well thank you very much for your help! You were very kind to answer all my questions. Best regards. Valeria
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
Hi Gabriel! This is the first time I am trying to use the program. Trying to do the tests with the Test Data from the terminal in Ubuntu with:
contDeam.pl --library double --out outputdir/sim testdata/simulation.bam
I get the following error:I don't know what I'm doing wrong. If you could help me I would appreciate it. Regards, Valeria