zeeev / wham

Structural variant detection and association testing
Other
101 stars 25 forks source link

Multi sample running #23

Closed ghost closed 8 years ago

ghost commented 8 years ago

Hi Zeeev

I ran WHAM with 100 BAM files (WGS; 50X) and the run was stooped with the error message below. I first thought that there might be a path issue but not really - I checked it. For comparison, I ran with one sample using the same command (the one suggested in the manual page) and it worked fine.

Then, I ran the multi sample one by increasing thread option. It went further genomic coordinations but still got the error message. I still don't know what the problem was.

Also, 1) is there any sample number for multi sample run? 2) Multi sample run performs better than single sample run?


INFO: running region: 1:126000500-127000500 INFO: running region: 2:233000500-234000500 INFO: running region: 1:33000500-34000500 INFO: running region: 2:242000500-243000500 INFO: running region: 1:132000500-133000500 INFO: running region: 3:4000500-5000500 INFO: running region: 1:135000500-136000500 INFO: running region: 3:10000500-11000500 INFO: running region: 1:12000500-13000500 INFO: running region: 3:16000500-17000500 INFO: running region: 1:144000500-145000500 INFO: running region: 1:42000500-43000500 INFO: running region: 3:22000500-23000500 INFO: running region: 1:147000500-148000500 INFO: running region: 3:28000500-29000500 INFO: running region: 3:34000500-35000500 INFO: running region: 1:150000500-151000500 INFO: running region: 3:40000500-41000500 INFO: running region: 3:46000500-47000500 INFO: running region: 1:156000500-157000500 INFO: running region: 3:49000500-50000500 INFO: running region: 3:55000500-56000500 could not open /data/pindel/human_g1k_v37.fasta

zeeev commented 8 years ago

Sorry for the late reply, I've been on holiday. How many CPUs are you using?

ghost commented 8 years ago

Thanks Zeeev. I run WHAM interactively using AWS m4.4xlarge, so CPU is 16, and memory is 64Gib.

zeeev commented 8 years ago

This can happen when threading overruns the number of file handles. If you are running 100 samples with 16 threads you are opening 1600 file handles. Check what the maximum number of file handles are.

Alternatively you could try WHAM-GRAPHNEING. It is a more accurate version of WHAM designed for DEL, DUP, and INV. You run the individuals separately and then merge them together. The details are on the README.

Let me know if you resolve this issue.

Thanks for reporting the bug.

ghost commented 8 years ago

Thanks Zeee! I will try WHAM-GRAPHNEING and let you know how it goes.

I am wondering whether any document for this is available. Though I can see the workflow diagram, I am looking forward to reading the detailed docs and principles for W-G.

Thanks

ghost commented 8 years ago

One more question. For running WHAM, is a multi sample run better than a single sample run (e.g. accuracy)? In the paper, multiple sample run was only mentioned..

zeeev commented 8 years ago

For both WHAM and WHAM-G there is only a slight increase in sensitivity for joint calling at the expense of many more false positives. That's why I now use merging in WHAM-GRAPHENING.

ghost commented 8 years ago

Hi Zeeev

I ran both WHAM and WHAM-G and found WHAM-G generated lower number of calls - Run with chr22 WGS sample: WHAM: 8000 calls WHAM-G: 200 calls

Another issue, WHAM-G did not report a type of CNVs. I could only infer it based on REF and ALT information. Also, all calls were insertions (or duplications?). Please correct me if I am wrong..

Thanks

zeeev commented 8 years ago

Hi @sehrrot ,

Sorry for the slow response I was on vacation.

The number of calls seems reasonable. I am not sure what you mean by WHAM-G did not report a type of CNV. In the alt column you should see:

"" "" ""

Can you post a line from the VCF file in question.

ghost commented 8 years ago

Hi @zeeev

Thanks for the reply! I should've put the other way: Wham does not report a type of CNV, and all of calls have 'N' at the REF (please see an example below).

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample.chr22.bam

22 17005386 . N ANGTATGCCACCACTC . . LRT=0;WAF=.,0.500001,0.500001;GC=0,1;AT=1,0,0,0,0,0,0,0,0,0,0,0.0176991,0.00884956,0.0176991,2.37402;CF=0.619469;CISTART=17005349,17005421;CIEND=17005253,17005255;PU=6;SU=0;CU=20;RD=113;NC=5;MQ=42.5398;MQF=0.893805;SP=2,0,0;CHR2=22;DI=b;END=17005255;SVLEN=130 GT:GL:NR:NA:NS:RD 0/1:-166.875,-78.3256,-940.627:95:18:6:113

Re Wham-G, thanks for letting me know the number issue. Other than this, all works well so far.

zeeev commented 8 years ago

@sehrrot Glad to hear. Thank you for using WHAM.