Open tomwhite opened 3 years ago
>>> import sgkit as sg >>> ds = sg.simulate_genotype_call_dataset(n_variant=10, n_sample=2) >>> ds <xarray.Dataset> Dimensions: (alleles: 2, ploidy: 2, samples: 2, variants: 10) Dimensions without coordinates: alleles, ploidy, samples, variants Data variables: variant_contig (variants) int64 0 0 0 0 0 0 0 0 0 0 variant_position (variants) int64 0 1 2 3 4 5 6 7 8 9 variant_allele (variants, alleles) |S1 b'C' b'G' b'A' ... b'C' b'A' sample_id (samples) <U2 'S0' 'S1' call_genotype (variants, samples, ploidy) int8 0 0 1 0 1 ... 0 1 0 1 1 call_genotype_mask (variants, samples, ploidy) bool False False ... False Attributes: contigs: [0] >>> sg.count_variant_alleles(ds) <xarray.Dataset> Dimensions: (alleles: 2, ploidy: 2, samples: 2, variants: 10) Dimensions without coordinates: alleles, ploidy, samples, variants Data variables: variant_allele_count (variants, alleles) uint64 dask.array<chunksize=(10, 2), meta=np.ndarray> call_allele_count (variants, samples, alleles) uint8 dask.array<chunksize=(10, 2, 2), meta=np.ndarray> variant_contig (variants) int64 0 0 0 0 0 0 0 0 0 0 variant_position (variants) int64 0 1 2 3 4 5 6 7 8 9 variant_allele (variants, alleles) |S1 b'C' b'G' b'A' ... b'C' b'A' sample_id (samples) <U2 'S0' 'S1' call_genotype (variants, samples, ploidy) int8 0 0 1 0 1 ... 1 0 1 1 call_genotype_mask (variants, samples, ploidy) bool False False ... False Attributes: contigs: [0] >>> sg.count_variant_alleles(ds, merge=False) <xarray.Dataset> Dimensions: (alleles: 2, variants: 10) Dimensions without coordinates: alleles, variants Data variables: variant_allele_count (variants, alleles) uint64 dask.array<chunksize=(10, 2), meta=np.ndarray>
Notice that for the call with merge=False the intermediate variable call_allele_count is not returned (and is effectively lost). I'm not sure if this behaviour is a problem in practice, but it might be something we want to fix.
merge=False
call_allele_count
Tricky one...
I think it would be nice to maintain this invariant:
merge(ds, method(ds, merge=False)) == method(ds)
Notice that for the call with
merge=False
the intermediate variablecall_allele_count
is not returned (and is effectively lost). I'm not sure if this behaviour is a problem in practice, but it might be something we want to fix.