Open evanbiederstedt opened 5 years ago
https://github.com/mskcc/vaporware/blob/develop/somatic.nf#L49-L112
Here's how to do this
delly call -t BND
-g /juno/work/taylorlab/cmopipeline/mskcc-igenomes/igenomes/Homo_sapiens/GATK/GRCh37/Sequence/WholeGenomeFasta/human_g1k_v37_decoy.fasta
-o output.bcf tumor.bam normal.bam
the flag TRA
is officially outdated
@evanbiederstedt The viral integration part is literally just the BND
output? I can just publish it from the DellyCall process if we keep --exclude ${svCallingExcludeRegions}
The viral integration part is literally just the BND output? I can just publish it from the DellyCall process if we keep --exclude ${svCallingExcludeRegions}
@allanbolipata You'll need to use the special reference FASTA which has viral sequences
I also think it's worth not using --exclude ${svCallingExcludeRegions}
here, but @kpjonsson might disagree ferociously.
What is the special reference FASTA? I can add them.
What is the special reference FASTA? I can add them.
It's in /juno/work/taylorlab/cmopipeline/mskcc-igenomes/grch37/viral_reference
This needs to be set in the config file for GRCh37 vs. GRCh38
CC @allanbolipata
Let's try Manta.
I believe this will work, putting the viral FASTA in --referenceFasta
:
${MANTA_INSTALL_PATH}/bin/configManta.py \
--normalBam normal.bam \
--tumorBam tumor.bam \
--referenceFasta hg19.fa \
--runDir ${MANTA_ANALYSIS_PATH}
This is an experimental test.
https://github.com/Illumina/manta/blob/master/docs/userGuide/README.md
But let's try a few things:
-- tumor-only
${MANTA_INSTALL_PATH}/bin/configManta.py \
--tumorBam HCC1187C.cram \
--referenceFasta hg19.fa \
--runDir ${MANTA_ANALYSIS_PATH}
with the viral FASTA in --referenceFASTA
---Single Diploid Sample Analysis, with the viral FASTA in the argument --referenceFASTA
${MANTA_INSTALL_PATH}/bin/configManta.py \
--bam NA12878_S1.bam \
--referenceFasta hg19.fa \
--runDir ${MANTA_ANALYSIS_PATH}
Let's run this configuration on the 25 BRCA samples: https://github.com/mskcc/vaporware/blob/master/test_inputs/lsf/WES_25TN.tsv
We're unlikely to catch any viral integration in any of those samples. For proving that it works we likely need to download some published samples (e.g. TCGA) with viral integration.
@evanbiederstedt Should I not use --exome
?
@kpjonsson Do you have TCGA sample bams?
@kpjonsson Do you have TCGA sample bams?
Not at hand. One could try with for example a set of the TCGA liver cancers, where viral integration is common. That being said, it'll still be sparse signal since these are exomes, not genomes. Not sure what to expect.
There are some stomach cancer BAMs in /ifs/tcga/stad/BAMs/
, these might work, since there's a fraction of stomach cancers with EBV and they picked these up based on exome data, although with a method different from the one we intend to use.
Error running Manta
ERROR ~ Error executing process > 'RunMantaViralFasta (1)'
Caused by:
Process `RunMantaViralFasta (1)` terminated with an error exit status (1)
Command executed:
configManta.py --exome --referenceFasta SuperReference.fa --normalBam normal_sample.sorted.md.bqsr.bam --tumorBam tumor_sample.sorted.md.bqsr.bam --runDir Manta
python Manta/runWorkflow.py --mode local --jobs 8
Command exit status:
1
Command output:
(empty)
Command error:
CONFIGURATION ERROR:
Reference genome mismatch: Reference fasta file is missing a chromosome found in the Normal BAM/CRAM file: 'NC_007605'
Work dir:
/juno/work/pi/cmopipeline/nextflow/vaporware_executes/executor_1/work/c8/3ff57b97d832a6c0b03b6470918cbc
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
-- Check '.nextflow.log' file for details
The SuperReference.fa
file has >EBVType1.NC_007605.1
, though.
Gonna try what's referenced at https://github.com/Illumina/manta/issues/93
This is the email I send back in March 4th:
Just a note of what I have read so far. Keep a record for future use.
Questions:
Useful Links: Initial discussion: https://www.biostars.org/p/227778/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6050683/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4673242/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6283451/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4499804/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4333248/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4580395/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2224419/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3754044/
https://www.biorxiv.org/content/biorxiv/early/2017/10/25/208926.full.pdf
Since EBV commonly contaminates human DNA, we include it in the standard GRCh37 reference. You can do a samtools idxstats
on all Roslin DMP-WES BAMs aligned to date, to find something with an abundance of EBV DNA. Then use that for your SV caller tests to zero in on the integration site.
RE: https://github.com/mskcc/vaporware/issues/102#issuecomment-490531767
It's a reference file issue; gonna wait on a new set of reference files then gonna try a re-run.
Seems to be that SuperReference.fa
is built on a different version of the human genome than the one we align against. Maybe that's what you already figured out @allanbolipata?
Seems to be that SuperReference.fa is built on a different version of the human genome than the one we align against. Maybe that's what you already figured out @allanbolipata?
@kpjonsson We're way ahead of you, min vän vid Charles floden
We're re-creating the viral FASTA now
@kpjonsson Yeah it's in (JUNO-only) ${params.reference_base}/mskcc-igenomes/grch37/viral_reference/human_g1k_v37_plus_all_viruses.fa
But I ran into another error, which is the inverse of https://github.com/mskcc/vaporware/issues/102#issuecomment-490531767:
Reference genome mismatch: Normal BAM/CRAM file is missing a chromosome found in the reference fasta file: 'HPVType14D.X74467.1'
Manta doesn't seem to work with a different reference file from the one used to make the BAMs.
It appears clinbx uses https://github.com/G100DKFZ/gene-is
This looks promising as well: https://github.com/namphuon/ViFi
Here are the viruses we will try:
185 HPV subtypes, all HHV (including EBV), merkel cell, HTLV-1, and hepatitis B
We'll ask Clinical Bioinformatics (Anita) for help.
Method we'll try
Download the FASTAs for these viruses. Then concatenate these FASTAs with hg19. Then, use this reference as the reference for Delly and for Manta.
i.e. we do SV calling with this "special" reference.
Then, look at translocation events TRA (at least this is how it works for Delly). Viral integration sites are called by SV callers as a translocation, because the SV caller thinks the read must be coming from another chromosome, i.e. "a translocation".