malariagen / malariagen-data-python

Analyse MalariaGEN data from Python
https://malariagen.github.io/malariagen-data-python/latest/
MIT License
13 stars 23 forks source link

Amend anoph snp_data.py where GQ, AD, MQ are still expected to be alongside GT #525

Open leehart opened 2 months ago

leehart commented 2 months ago

GQ, AD, MQ calldata were recently moved. The following code is apparently relying on out-of-date Zarr metadata and breaks when that Zarr metadata is updated.

# Set up call arrays.
calls_root = self.open_snp_genotypes(sample_set=sample_set)
gt_z = calls_root[f"{contig}/calldata/GT"]
call_genotype = da_from_zarr(gt_z, inline_array=inline_array, chunks=chunks)
gq_z = calls_root[f"{contig}/calldata/GQ"]
call_gq = da_from_zarr(gq_z, inline_array=inline_array, chunks=chunks)
ad_z = calls_root[f"{contig}/calldata/AD"]
call_ad = da_from_zarr(ad_z, inline_array=inline_array, chunks=chunks)
mq_z = calls_root[f"{contig}/calldata/MQ"]
call_mq = da_from_zarr(mq_z, inline_array=inline_array, chunks=chunks)
data_vars["call_genotype"] = (
    [DIM_VARIANT, DIM_SAMPLE, DIM_PLOIDY],
    call_genotype,
)
data_vars["call_GQ"] = ([DIM_VARIANT, DIM_SAMPLE], call_gq)
data_vars["call_MQ"] = ([DIM_VARIANT, DIM_SAMPLE], call_mq)
data_vars["call_AD"] = (
    [DIM_VARIANT, DIM_SAMPLE, DIM_ALLELE],
    call_ad,
)
leehart commented 2 months ago

This issue might become obsolete, since the GQ, AD and MQ might be moved back alongside GT.

leehart commented 2 weeks ago

This issue will become obsolete when we decommission the multi-region release buckets, which is currently scheduled to happen on 1st September this year. The single-region release "master" buckets will continue to contain GQ, AD and MQ alongside GT, and will continue to require user authentication.