Hello,
Right now I am running into a problem while trying to parse my cnv and vcf data. I am currently able to parse the vcf data without the cnv data(using regions=all), but when I try to use the --cnvs I keep getting this error:
Traceback (most recent call last):
File "create_phylowgs_inputs.py", line 1420, in
main()
File "create_phylowgs_inputs.py", line 1388, in main
grouper.exclude_variants_in_multiple_abnormal_or_unlisted_regions()
File "create_phylowgs_inputs.py", line 989, in exclude_variants_in_multiple_abnormal_or_unlisted_regions
self._filter_variants_outside_regions(self._multisamp_cnv.load_cnvs(), 'all_variants', 'within_cn_regions')
File "create_phylowgs_inputs.py", line 856, in load_cnvs
abnormal_cnvs = self.load_single_abnormal_state_cnvs()
File "create_phylowgs_inputs.py", line 811, in load_single_abnormal_state_cnvs
states_for_all_samples = self._get_abnormal_state_for_all_samples(chrom, cnv)
File "create_phylowgs_inputs.py", line 773, in _get_abnormal_state_for_all_samples
assert len(retained_sampidxs) == len(set(retained_sampidxs))
AssertionError
The error is accompanied by this comment in the code:
Sanity check: we should have no duplicate samples. While a given sample
may report any number of records for a region, above we discarded normal
regions, and ensured that only one abnormal state exists in all samples.
Thus, we should have no more than one record per sample for this region.
I check through the cnv data, and as far as I could tell, there were no duplicates, and I am only working with one sample. I'm not really proficient with the biology side of this, but as least from the cs side I was seeing that in _get_abnormal_state_for_all_samples() function, the retained_smapidx was picking up 2 entries sometimes instead of 1. Do you know why this may be happening?
Thanks in advance.
Hello, Right now I am running into a problem while trying to parse my cnv and vcf data. I am currently able to parse the vcf data without the cnv data(using regions=all), but when I try to use the --cnvs I keep getting this error: Traceback (most recent call last): File "create_phylowgs_inputs.py", line 1420, in
main()
File "create_phylowgs_inputs.py", line 1388, in main
grouper.exclude_variants_in_multiple_abnormal_or_unlisted_regions()
File "create_phylowgs_inputs.py", line 989, in exclude_variants_in_multiple_abnormal_or_unlisted_regions
self._filter_variants_outside_regions(self._multisamp_cnv.load_cnvs(), 'all_variants', 'within_cn_regions')
File "create_phylowgs_inputs.py", line 856, in load_cnvs
abnormal_cnvs = self.load_single_abnormal_state_cnvs()
File "create_phylowgs_inputs.py", line 811, in load_single_abnormal_state_cnvs
states_for_all_samples = self._get_abnormal_state_for_all_samples(chrom, cnv)
File "create_phylowgs_inputs.py", line 773, in _get_abnormal_state_for_all_samples
assert len(retained_sampidxs) == len(set(retained_sampidxs))
AssertionError
The error is accompanied by this comment in the code: Sanity check: we should have no duplicate samples. While a given sample may report any number of records for a region, above we discarded normal regions, and ensured that only one abnormal state exists in all samples. Thus, we should have no more than one record per sample for this region.
I check through the cnv data, and as far as I could tell, there were no duplicates, and I am only working with one sample. I'm not really proficient with the biology side of this, but as least from the cs side I was seeing that in _get_abnormal_state_for_all_samples() function, the retained_smapidx was picking up 2 entries sometimes instead of 1. Do you know why this may be happening? Thanks in advance.