ma-compbio / Weaver

Allele-Specific Quantification of Structural Variations in Cancer Genomes
MIT License
17 stars 7 forks source link

Unable to reproduce example results #13

Open d-cameron opened 7 years ago

d-cameron commented 7 years ago

I have installed weaver from github to ~/weaver and unpacked Weaver_data.tar.gz and Weaver_example.tar.gz to the Weaver_data and Weaver_example sub-directories respectively. I have fixed the broken symlinks in ~/weaver/data

I am attempting to run the example. When I rerun Weaver_example/cmd I get the following output:

$ ../bin/bam2bw.pl lite X.bam 64 M
$ ~/weaver/bin/Weaver PLOIDY -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64 -t 20 -n 0
RUN MODE        PLOIDY
THREAD was set to 64.
FASTA was set to SIMU.fa.
WIG was set to X.bam.wig.
MAP was set to map100mer.bd.
SV was set to FINAL_SV.
SNP was set to SNP.
GAP was set to REGION.
RUNFLAG was set to 0.
Getting coverage profile...
Getting coverage profile done!
Getting GC content done!
Getting Mapability done!
Estimated cancer haplotype coverage:    0
Estimated normal haplotype coverage:    0
$ ~/weaver/bin/Weaver LITE -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64 -t 20 -n 0
RUN MODE        LITE
THREAD was set to 64.
FASTA was set to SIMU.fa.
WIG was set to X.bam.wig.
MAP was set to map100mer.bd.
SV was set to FINAL_SV.
SNP was set to SNP.
GAP was set to REGION.
RUNFLAG was set to 0.
TUMOR coverage was set to 20.
NORMAL was set to 0.
Getting coverage profile...
Getting coverage profile done!
Getting GC content done!
Getting Mapability done!
base_mean = 20
best_norm = 0
LBP scan
LBP
LBP init
LBP print
LBP scan
$ ls -l
total 1231020
-rw-r----- 1 cameron.d allstaff        405 Oct 30  2014 cmd
-rw-r--r-- 1 cameron.d allstaff          0 May 22 15:42 EACH_REGION
-rw-r--r-- 1 cameron.d allstaff          0 May 22 15:42 EACH_REGION_1
drwxr-x--- 2 cameron.d allstaff          9 Oct 30  2014 INPUT
drwxr-x--- 2 cameron.d allstaff         11 Nov  4  2014 OUTPUT
-rw-r--r-- 1 cameron.d allstaff          0 May 22 15:42 REGION_CN_PHASE
-rw-r--r-- 1 cameron.d allstaff          0 May 22 15:42 SNP_CN_PHASE
-rw-r--r-- 1 cameron.d allstaff          0 May 22 15:42 SV_CN_PHASE
-rw-r--r-- 1 cameron.d allstaff          0 May 22 15:42 SV_REMOVED
-rw-r--r-- 1 cameron.d allstaff          0 May 22 15:42 SV_SELECTED
-rw-r--r-- 1 cameron.d allstaff          0 May 22 15:42 TARGET
-rw-r--r-- 1 cameron.d allstaff          0 May 22 15:42 tempfile
-rw-r----- 1 cameron.d allstaff 1097298665 Oct 30  2014 X.bam
-rw-r----- 1 cameron.d allstaff     161816 Oct 30  2014 X.bam.bai
-rw-r--r-- 1 cameron.d allstaff         28 May 22 15:19 X.bam.G1
-rw-r--r-- 1 cameron.d allstaff         28 May 22 15:19 X.bam.G2
-rw-r--r-- 1 cameron.d allstaff  162603325 May 22 15:24 X.bam.wig

Note that the files are almost all empty files and are located in the ~/weaver/Weaver_example directory, not the ~/weaver/Weaver_example/OUTPUT subdirectory.

What else is required to reproduce the example output for the supplied example? Are you able to supply a working shell script that performs all weaver steps from just a just a (small) bam file, a reference fa, and the minimal set of additional files?

The example INPUT directory contains many input files for which there does not appear to be any instructions on how to generate these files. I am attempting to run weaver against a hg38 bam file and it is very unclear what files I need from where (e.g. the example includes a map100mer.bd file, but the ENCODE mappability tracks are in bigWig format and there does not appear to be any conversion instructions; it looks like a 4 column .bed file but there's no header and I don't know of column 4 is rescaled).

d-cameron commented 7 years ago

If I use the run instructions in the README instead of cmd I get:

$ Weaver PLOIDY -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64
RUN MODE        PLOIDY
THREAD was set to 64.
FASTA was set to SIMU.fa.
WIG was set to X.bam.wig.
MAP was set to map100mer.bd.
SV was set to FINAL_SV.
SNP was set to SNP.
GAP was set to REGION.
RUNFLAG was set to 0.
Getting coverage profile...
Getting coverage profile done!
Getting GC content done!
Getting Mapability done!
Estimated cancer haplotype coverage:    0
Estimated normal haplotype coverage:    0
$ ll
total 1071780
-rw-r----- 1 cameron.d allstaff        405 Oct 30  2014 cmd
drwxr-x--- 2 cameron.d allstaff          9 May 29 11:59 INPUT
drwxr-x--- 2 cameron.d allstaff         10 Nov  4  2014 OUTPUT
-rw-r--r-- 1 cameron.d allstaff          0 May 29 12:01 TARGET
-rw-r--r-- 1 cameron.d allstaff          0 May 29 12:01 tempfile
-rw-r----- 1 cameron.d allstaff 1097298665 Oct 30  2014 X.bam
-rw-r----- 1 cameron.d allstaff     161816 Oct 30  2014 X.bam.bai
d-cameron commented 7 years ago

I then thought that maybe you weren't raising any error messages if the input files specified did not exist. Rerunning in the INPUT directory results in:

$ Weaver PLOIDY -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64
RUN MODE        PLOIDY
THREAD was set to 64.
FASTA was set to SIMU.fa.
WIG was set to X.bam.wig.
MAP was set to map100mer.bd.
SV was set to FINAL_SV.
SNP was set to SNP.
GAP was set to REGION.
RUNFLAG was set to 0.
chrA    13388
chrB    9702
Getting coverage profile...
Getting coverage profile done!
Getting GC content done!
Getting Mapability done!
cao     2085482 0       2085482 0       0
cao     2090520 1       2090520 0       0
...
cao     21418146        23028   21418146        0       0
cao     23983213        23029   23983213        0       0
Estimated cancer haplotype coverage:    0
Estimated normal haplotype coverage:    0
Segmentation fault (core dumped)
d-cameron commented 7 years ago

Changing -r 0 to -r 1 as recommended in https://github.com/ma-compbio/Weaver/issues/11 results in:

$ Weaver PLOIDY -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 1 -m map100mer.bd -p 64 2>&1 | tee    log.txt
RUN MODE        PLOIDY
THREAD was set to 64.
FASTA was set to SIMU.fa.
WIG was set to X.bam.wig.
MAP was set to map100mer.bd.
SV was set to FINAL_SV.
SNP was set to SNP.
GAP was set to REGION.
RUNFLAG was set to 1.
chrA    13388
chrB    9702
chrA
chrB
sh: /../external_bin/bedtools: No such file or directory
Getting coverage profile...
Getting coverage profile done!
Getting GC content done!
Getting Mapability done!
Estimated cancer haplotype coverage:    0
Estimated normal haplotype coverage:    0
d-cameron commented 7 years ago

Uncommenting main.cpp:250 indicates that BIN is set to an empty string