taylor-lab / hotspots

Identifying recurrent mutations in cancer
http://www.ncbi.nlm.nih.gov/pubmed/26619011
GNU Affero General Public License v3.0
37 stars 23 forks source link

unable to run script make_trinuc_maf.py #5

Closed tcgriffith closed 7 years ago

tcgriffith commented 7 years ago

I've attempted to run the following code from README:

/hotspot_algo.R --input-maf=minimalist_test_maf.txt --rdata=hotspot_algo.Rdata --gene-query=genes_of_interest.txt --output-file=testrun_sig_hotspots.txt

and I've got this:

Reading in MAF... Prepping MAF for analysis ... ... Ignoring non-SNP mutations ... Making bed file ... Getting regions Error: The requested fasta database file (/ifs/depot/assemblies/H.sapiens/GRCh37/gr37.fasta) could not be opened. Exiting! ... Adding trinucs (normalized to start from C or T) ... Writing to ___temp_maf-tri.tm ... Cleaning up

It seems that the script require a giant gr37.fasta instead of multiple files. How could I fix this?

changmt commented 7 years ago

A workaround now would be to download the gr37.fasta. You can find it here: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/ following the instructions for hg19.2bit -> hg19.fa and then changing the path in make_trinuc_maf.py.

I will fix the hard link the link in make_trinuc_maf.py to accept an argument instead. Thanks, Matt

tcgriffith commented 7 years ago

I've tried hg19, but the chromosome names are different from GRCh37 like 1 <=>chr1. This also makes the bedtools complain about not finding the chromosome.

Then I found out that I've installed VEP following this gist and a compressed GRCh37 reference fasta is in the data folder. I believe the downloading work is done in a script from VEP called INSTALL.pl. I haven't checked the code.

Hope this is helpful to this issue. Regards, TC

tcgriffith commented 7 years ago

By the way, bedtools supports "gzipped" files.