morrislab / phylowgs

Application for inferring subclonal composition and evolution from whole-genome sequencing data.
GNU General Public License v3.0
108 stars 55 forks source link

Array length error when preparing files in a multi-samples setting #54

Closed af8 closed 7 years ago

af8 commented 7 years ago

Hi,

I have successfully run phyloWGS (commit 17c5362) on one WGS sample. I am now trying to run it in a multi-samples (two) setting.

And when calling :

python ${PHYLOWGS}/parser/create_phylowgs_inputs.py \
--sex female \
--regions normal_and_abnormal_cn \
--cnvs NEG=TU1600E.cnvs.txt --cnvs POS=TU1600F.cnvs.txt \
--vcf-type NEG=mutect_slc --vcf-type POS=mutect_slc \
NEG=data/TU1600E.m1.snv.vcf POS=data/TU1600F.m1.snv.vcf

[mutect_slc is a vcf-type (and parser) I have added to fit our data]

During the CNV preparation step, I obtain the following error :

...
all_variants=8643 outside_subclonal_cn=6966 delta=1677
1_560052    [outside all regions]
...
X_659271    [outside all regions]
Estimated read depth: [ 48.  47.]
[ 144000.  141000.]
Traceback (most recent call last):
  File "/data-ddn/software/phylowgs/17c5362/parser/create_phylowgs_inputs.py", line 1356, in <module>
    main()
  File "/data-ddn/software/phylowgs/17c5362/parser/create_phylowgs_inputs.py", line 1344, in main
    grouper.write_cnvs(subsampled_vars, args.output_cnvs)
  File "/data-ddn/software/phylowgs/17c5362/parser/create_phylowgs_inputs.py", line 1037, in write_cnvs
    for cnv in formatter.format_and_merge_cnvs(self._multisamp_cnv.load_single_abnormal_state_cnvs(), variants, self._cellularity):
  File "/data-ddn/software/phylowgs/17c5362/parser/create_phylowgs_inputs.py", line 394, in format_and_merge_cnvs
    formatted = list(self._format_cnvs(cnvs, variants))
  File "/data-ddn/software/phylowgs/17c5362/parser/create_phylowgs_inputs.py", line 363, in _format_cnvs
    total_reads = self._calc_total_reads(cnv['start'], cnv['end'])
  File "/data-ddn/software/phylowgs/17c5362/parser/create_phylowgs_inputs.py", line 342, in _calc_total_reads
    return self._cap_cnv_D(D)
  File "/data-ddn/software/phylowgs/17c5362/parser/create_phylowgs_inputs.py", line 354, in _cap_cnv_D
    D_max = int(np.round(avg_ssms_in_tumour * self._read_depth))
TypeError: only length-1 arrays can be converted to Python scalars

self._read_depth is an array of length 2 here and Python is not happy with the cast to int.

What is the quick fix for this ? Taking the max of np.round(avg_ssms_in_tumour * self._read_depth) first ?

Thanks for your help, Anthony

ramaniak commented 7 years ago

I tried with e42402b and get the same error on multi-sample analysis. Works fine with single samples.

jwintersinger commented 7 years ago

Thank you for reporting this, @af8 and @ramaniak. I fixed the bug in 3cf19f27efbcf341e89677479fad8a55a69943e6. Please let me know if you encounter any issues.

The bigger issue was that our test suite didn't handle multisample CNAs, which allowed this bug to slip through. I rewrote the test suite to cover these cases in 1b9c82f3cebc027624fc01744b7227a6e6b54897.

Thank you again!