Open enabieva opened 5 years ago
Sorry for the delayed response. When you run "cleanCall genotype", you need to provide --col-mixid option to indicate the column that contains the IDs of known contaminating sample.
Sorry that it was not included in the README.md but it was documented in cleanCall genotype software tool itself. I hope that this is helpful, but let me know if you have further questions.
Hyun Min Kang, Ph.D. Associate Professor of Biostatistics University of Michigan, Ann Arbor Email : hmkang@umich.edu
On Fri, Jun 14, 2019 at 6:41 PM Elena Nabieva notifications@github.com wrote:
Could you please post instructions running CleanCall when the source of contamination is known? I can't seem to find the anywhere.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hyunminkang/cleancall/issues/8?email_source=notifications&email_token=ABPY5OKLTLS3YUB3APSI3CLP2NRV3A5CNFSM4HYGG7S2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GZQMQSA, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPY5OPMKDLJFEXRDY2Y27TP2NRV3ANCNFSM4HYGG7SQ .
Thank you Hyun. I have seen the command-line option, but I still cannot figure out how exactly to apply it. Could you please give an example of an index file for the contamination sample case?
Hi Elena,
You need to add additional column in your PED file. For example, header could be
#FAM_ID IND_ID DAD_ID MOM_ID SEX MPU CONTAM MIX_ID
And include the identity of contaminating sample in MIX_ID, and use --col-mixid MIX_ID when you run the tool. I hope this is clear.
Thank you very much for your reply. Can I run my pipeline by you to make sure I'm running it correctly: The setup is that I have an abortus sample and a mother sample, and I want to cleancall the abortus given that it may be contaminated with the mother. The index file looks like this: index.txt: abortus abortus.bam mother mother.bam
I run it through cleancall-pileup: $CLEANCALL/cctools pileup --loci $HAPMAP --index index.txt --out out --ref $REF --run 6 (where $HAPMAP refers to the hapmap 3 file from GATK resource bundle)
it creates a a ped file that looks like this:
abortus abortus 0 0 0 out.abortus.txt.gz mother mother 0 0 0 out.mother.txt.gz
(and the corresponding pileups, i.e., out*.txt.gz files that are mentioned in the ped file)
Then I run cleancall-verify as: $CLEANCALL/cctools verify --index out.ped --out out.verify --vcf $HAPMAP --run 6 giving it the ped file from above As a result it gives me a out.verify.ped file that looks like this:
abortus abortus 0 0 0 out.abortus.txt.gz 0.17626 NA mother mother 0 0 0 out.mother.txt.gz 0.00114 NA
Which I edit, as per my understanding of what you have written (and correcting the pedigree info):
abortus abortus 0 mother 0 out.abortus.txt.gz 0.17626 NA mother abortus mother 0 0 0 out.mother.txt.gz 0.00114 NA NA
And then I run it (just on chr1, since otherwise I get an error) as
Is this correct? Thank you very much for the help!
Could you please post instructions running CleanCall when the source of contamination is known? I can't seem to find the anywhere.