Open awgymer opened 2 weeks ago
Thanks for the clear example code.
If you print out vd
after each of the assignments, you'll see that your code is falling victim to good ol' Python pass-by-reference. So something under new_record()
needs to be fixed to not alter its arguments.
Oh ooft! I didn't even think to check that! It makes sense though as I did see a reference to pop
once when I passed a dict instead of list of dicts to new_record
.
Looks like this is the culprit: https://github.com/pysam-developers/pysam/blob/0787ca9da997b5911c00fd12584dad9741c82fb4/pysam/libcbcf.pyx#L2113-L2122
.pop
used to get GT
if it's in a sample (also it's popped from the kwargs
but not sure that would matter since it wouldn't be a dict then).
My guess is that this is done rather than just using update
on everything so that GT
is always first into the dict (since dict insert order now matters).
I think simply setting the initial assignment line to not use pop
will work fine, since the update
won't mess up the order.
if samples:
for i, sample in enumerate(samples):
if 'GT' in sample:
rec.samples[i]['GT'] = sample['GT']
rec.samples[i].update(sample)
When generating some rows for tests I discovered that the
GT
field seems to be unset on the creation of a second identical record. Please see the below minimal example. (pysam v.0.21.0
)I'm not entirely sure why this happens. The header object still seems to have the
GT
field under formats in all the failing cases.