tgen / lumosVar2

Calls somatic SNVs, indels, and allelic copy number jointly across multiple samples from the same patient. These can be standard tumor/normal pair, longitudinal samples, primary/met, etc. Can also be used for tumor only calling, ideally with a high tumor content and a low tumor content sample.
MIT License
10 stars 1 forks source link

Instructions for tumor-only running? #4

Open vymao opened 4 years ago

vymao commented 4 years ago

Is there a set of instructions that we should follow for running this in tumor-only mode? For example, what should we put for the NormalMetrics field in the config file?

rhalperin commented 4 years ago

You just need to put "NormalSample: 0" in the yaml file for tumor only mode.

vymao commented 4 years ago

Thanks. And which COSMIC vcf should we be downloading from here?

rhalperin commented 4 years ago

You should download the "CosmicCodingMuts.normal.vcf.gz" if you are using GRCh38. If you are using GRCh37, here is a description of how to get the archived version: https://cancer.sanger.ac.uk/cosmic/file_download_info?data=GRCh37%2Fcosmic%2Fv88%2FVCF%2FCosmicCodingMuts.vcf.gz

vymao commented 4 years ago

Thank you. However, this won't be per chromosome, will it? Do we have to separate it out on our own?

rhalperin commented 4 years ago

No, it doesn't need to be per chromosome. It just needs bgzip and tabix index.

vymao commented 4 years ago

Thank you for the comments. I tried running in tumor-only mode through this:

#### Input Files
bamList: /path/to/test.txt  
regionsFile: /path/to/IMPACT_v2.bed 
snpVCFpath: /path/to/gnomad.genomes.r2.1.1.sites.   
snpVCFname: .vcf.gz     
NormalBase: 0   
cosmicVCF: /path/to/CosmicCodingMuts.vcf.gz 
refGenome: /path/to/Homo_sapiens_assembly19.fasta   

#### User Inputs
outName: /path/to/out.vcf   
outMat: /path/to/out_mat                 
gvmPath: /path/to/lumosVar2/bin/gvm       
workingDirectory: /path/to/lumosVar2/work 
NormalSample: 0    
priorF: [0.7]         
numCPU: 1          

However, I am getting the following error: ./lumosVarMain: error while loading shared libraries: libmwlaunchermain.so: cannot open shared object file: No such file or directory

Not sure what this is about. Am I missing a file I should have?

rhalperin commented 4 years ago

It looks like MCR is not installed, or is not in your LD_LIBRARY_PATH. Please see below for more information. http://www.mathworks.com/products/compiler/mcr/ https://www.mathworks.com/help/compiler/mcr-path-settings-for-run-time-deployment.html

vymao commented 4 years ago

Thank you. I think I fixed this problem, but now trying to run the guessSex.py script and I'm getting this error:

/path/to/guessSex.py:63: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(f)
finding common var pos for 22
process 0 is: bcftools view -q 0.2:nonmajor -v snps -R 
/path/to/IMPACT_v2.bed 
/path/to/gnomad.genomes.r2.1.1.sites.22.vcf.gz
['bcftools', 'view', '-q', '0.2:nonmajor', '-v', 'snps', '-R', 
'/path/to/IMPACT_v2.bed', 
'/path/to/gnomad.genomes.r2.1.1.sites.22.vcf.gz']
process 1 is: grep -v ^#
['grep', '-v', '^#']
finding common var pos for X
process 0 is: bcftools view -q 0.2:nonmajor -v snps -R 
/path/to/IMPACT_v2.bed 
/path/to/gnomad.genomes.r2.1.1.sites.X.vcf.gz
['bcftools', 'view', '-q', '0.2:nonmajor', '-v', 'snps', '-R', 
'/path/to/IMPACT_v2.bed', 
'/path/to/gnomad.genomes.r2.1.1.sites.X.vcf.gz']
process 1 is: grep -v ^#
['grep', '-v', '^#']
22process 0 finished
22process 1 finished
Xprocess 0 finished
Xprocess 1 finished
process 0 is: bcftools mpileup -Ou -b 
/path/to/LumosVar/test.txt -B -f 
/path/to/Homo_sapiens_assembly19.fasta -R commonVar.chr22.bed -I -Ou
['bcftools', 'mpileup', '-Ou', '-b', 
'/path/to/LumosVar/test.txt', '-B', '-f', 
'/path/to/Homo_sapiens_assembly19.fasta', '-R', 'commonVar.chr22.bed', '-I', '-Ou']
process 1 is: bcftools call -Ou -m
['bcftools', 'call', '-Ou', '-m']
process 2 is: bcftools stats -s -
['bcftools', 'stats', '-s', '-']
process 3 is: grep ^PSC
['grep', '^PSC']
process 4 is: cut -f3,4,5,6,14
['cut', '-f3,4,5,6,14']
process 0 is: bcftools mpileup -Ou -b 
/path/to/LumosVar/test.txt -B -f 
/path/to/Homo_sapiens_assembly19.fasta -R commonVar.chrX.bed -I -Ou
['bcftools', 'mpileup', '-Ou', '-b', 
'/path/to/LumosVar/test.txt', '-B', '-f', 
'/path/to/Homo_sapiens_assembly19.fasta', '-R', 'commonVar.chrX.bed', '-I', '-Ou']
process 1 is: bcftools call -Ou -m
['bcftools', 'call', '-Ou', '-m']
process 2 is: bcftools stats -s -
['bcftools', 'stats', '-s', '-']
process 3 is: grep ^PSC
['grep', '^PSC']
process 4 is: cut -f3,4,5,6,14
['cut', '-f3,4,5,6,14']
Note: none of --samples-file, --ploidy or --ploidy-file given, assuming all sites are diploid
Note: none of --samples-file, --ploidy or --ploidy-file given, assuming all sites are diploid
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
22process 0 finished
22process 1 finished
22process 2 finished
22process 3 finished
22process 4 finished
Xprocess 0 finished
Xprocess 1 finished
Xprocess 2 finished
Xprocess 3 finished
Xprocess 4 finished
autosome stats:s_DS_bkm_085_T   0   0   0   0
sexChr stats:s_DS_bkm_085_T 0   0   0   0
numHomA: 0
numHomS: 0
numHetA: 0
numHetS: 0
Traceback (most recent call last):
  File "/path/to/guessSex.py", line 138, in <module>
    hetA=int(dataA[3])/(int(dataA[1])+int(dataA[2]))
ZeroDivisionError: division by zero

Would you know about this?

rhalperin commented 4 years ago

It looks like it is not finding any common variants to use for the sex determination. It is using this query to identify common variants: bcftools view -q 0.2:nonmajor -v snps -R /path/to/gnomad.genomes.r2.1.1.sites.22.vcf.gz Please try running the command above. If that does not output any variants, then I suspect you probably need to modify your vcf so that bcftools is able to filter on the allele frequencies. If it does output variants, then the problem may be downstream of that step, so please let me know.

vymao commented 4 years ago

I am getting a usage error:

About:   VCF/BCF conversion, view, subset and filter VCF/BCF files.
Usage:   bcftools view [options] <in.vcf.gz> [region1 [...]]

Output options:
    -G,   --drop-genotypes              drop individual genotype information (after subsetting if -s option set)
    -h/H, --header-only/--no-header     print the header only/suppress the header in VCF output
    -l,   --compression-level [0-9]     compression level: 0 uncompressed, 1 best speed, 9 best compression [-1]
          --no-version                  do not append version and command line to the header
    -o,   --output-file <file>          output file name [stdout]
    -O,   --output-type <b|u|z|v>       b: compressed BCF, u: uncompressed BCF, z: compressed VCF, v: uncompressed VCF [v]
    -r, --regions <region>              restrict to comma-separated list of regions
    -R, --regions-file <file>           restrict to regions listed in a file
    -t, --targets [^]<region>           similar to -r but streams rather than index-jumps. Exclude regions with "^" prefix
    -T, --targets-file [^]<file>        similar to -R but streams rather than index-jumps. Exclude regions with "^" prefix
        --threads <int>                 number of extra (de)compression threads [0]

Subset options:
    -a, --trim-alt-alleles        trim alternate alleles not seen in the subset
    -I, --no-update               do not (re)calculate INFO fields for the subset (currently INFO/AC and INFO/AN)
    -s, --samples [^]<list>       comma separated list of samples to include (or exclude with "^" prefix)
    -S, --samples-file [^]<file>  file of samples to include (or exclude with "^" prefix)
        --force-samples           only warn about unknown subset samples

Filter options:
    -c/C, --min-ac/--max-ac <int>[:<type>]      minimum/maximum count for non-reference (nref), 1st alternate (alt1), least frequent
                                                   (minor), most frequent (major) or sum of all but most frequent (nonmajor) alleles [nref]
    -f,   --apply-filters <list>                require at least one of the listed FILTER strings (e.g. "PASS,.")
    -g,   --genotype [^]<hom|het|miss>          require one or more hom/het/missing genotype or, if prefixed with "^", exclude sites with hom/het/missing genotypes
    -i/e, --include/--exclude <expr>            select/exclude sites for which the expression is true (see man page for details)
    -k/n, --known/--novel                       select known/novel sites only (ID is not/is '.')
    -m/M, --min-alleles/--max-alleles <int>     minimum/maximum number of alleles listed in REF and ALT (e.g. -m2 -M2 for biallelic sites)
    -p/P, --phased/--exclude-phased             select/exclude sites where all samples are phased
    -q/Q, --min-af/--max-af <float>[:<type>]    minimum/maximum frequency for non-reference (nref), 1st alternate (alt1), least frequent
                                                   (minor), most frequent (major) or sum of all but most frequent (nonmajor) alleles [nref]
    -u/U, --uncalled/--exclude-uncalled         select/exclude sites without a called genotype
    -v/V, --types/--exclude-types <list>        select/exclude comma-separated list of variant types: snps,indels,mnps,ref,bnd,other [null]
    -x/X, --private/--exclude-private           select/exclude sites where the non-reference alleles are exclusive (private) to the subset samples

Not sure if this is what you meant?

rhalperin commented 4 years ago

Sorry, I left out an argument in the command. Please try: bcftools view -q 0.2:nonmajor -v snps -R /path/to/IMPACT_v2.bed /path/to/gnomad.genomes.r2.1.1.sites.22.vcf.gz

vymao commented 4 years ago

This outputted a lot of information, of which was a lot of #INFO... fields and variants, such as this:

22  19053406    rs2238749   G   T   9.40585e+06 PASS AC=14953;AN=31332;AF=0.477244;rf_tp_probability=0.971961;FS=0;InbreedingCoeff=0.0613;MQ=60;MQRankSum=0;QD=18.88;...
rhalperin commented 4 years ago

It looks like that step is performing as expected. Is it generating the 'commonVar.chr22.bed' file? If so please try the command below and please let me know what the output looks like. bcftools mpileup -b /path/to/LumosVar/test.txt -B -f /path/to/Homo_sapiens_assembly19.fasta -R commonVar.chr22.bed -I -Ov