sgkit-dev / sgkit

Scalable genetics toolkit
https://sgkit-dev.github.io/sgkit
Apache License 2.0
217 stars 32 forks source link

simulate_genotype_call_dataset creates alleles as byte strings #1222

Open hyanwong opened 1 month ago

hyanwong commented 1 month ago
ds = sg.simulate_genotype_call_dataset(n_variant=2, n_sample=4, missing_pct=0, phased=True, seed=1)
for i, alleles in enumerate(ds['variant_allele'].values):
    print(f"Site {i}: {alleles}")

Alleles are e.g. [b'T' b'C'] (dtype |S1). I was expecting them to be dtype <U1. Is this intentional?

jeromekelleher commented 1 month ago

I think this is a bug, which is probably related to: