Closed sanjaynagi closed 1 month ago
temporary workaround
sample_sets=[s for s in ag3.sample_sets()['sample_set'].to_list() if s != 'barron-2019']
Seems fine to handle this error internally, but how do we communicate to the user that the data are missing? Need to make sure it doesn't look like these samples have normal copy number.
Would it be sufficient to simply skip sample sets that don't have CNV HMM data in the API?
Essentially:
y = self._cnv_hmm_dataset(
contig=r.contig,
sample_set=s,
inline_array=inline_array,
chunks=chunks,
)
# If no CNV HMM dataset was found then skip
if y is None:
continue
ly.append(y)
For example, I can submit a PR that would allow:
ag3.plot_diplotype_clustering_advanced(
region='2L:28,535,000-28,552,000',
cnv_region='2L:28,535,000-28,552,000',
sample_query='taxon == "gambiae" and year > 2019',
site_mask='gamb_colu_arab',
color='taxon',
snp_transcript='AGAP006227-RA',
)
We now skip sample sets that don't have CNV HMM data but still raise ValueError
when no CNV HMM data are found at all.
There are no CNV calls for this sample set so it fails. Earlier versions of the diplotype clustering function had a try/except statement to get around this, could implement something like that.