pysam-developers / pysam

Pysam is a Python package for reading, manipulating, and writing genomics data such as SAM/BAM/CRAM and VCF/BCF files. It's a lightweight wrapper of the HTSlib API, the same one that powers samtools, bcftools, and tabix.
https://pysam.readthedocs.io/en/latest/
MIT License
773 stars 274 forks source link

TypeError: 'pysam.libcbcf.VariantRecordSamples' object does not support item assignment #1024

Open jun3234 opened 3 years ago

jun3234 commented 3 years ago

Thanks you for this great library.

But I have met some troubles. Could you help me solving it.

I have two vcf file. I want to merge those two into one file.

    v1 = pysam.VariantFile(vcf1, 'rb', threads=24)
    v2 = pysam.VariantFile(vcf2, 'rb', threads=24)

    ## outvcf
    outvcf = pysam.VariantFile(file, "w",header=hd)

    for rcd1 in v1:
        v2_ft = v2.fetch(rcd1.chrom,start=rcd1.pos -1, end=rcd1.pos)
        for rcd2 in v2_ft:
            print([rcd1.chrom, rcd1.pos, "rcd1"])
            print([rcd2.chrom, rcd2.pos, "rcd2"])
            if rcd1.alts == rcd2.alts:
                print(dir(rcd1.samples))
                print(rcd1.samples)
                rcd2.samples.update(rcd1.samples)

                print(rcd1.samples.update(rcd2.samples))
                outvcf.write(rcd2)
                sys.exit("55")

When I ran this, errors occurs with TypeError: 'pysam.libcbcf.VariantRecordSamples' object does not support item assignment.

As the rcd1.samples is <pysam.libcbcf.VariantRecordSamples object at 0x7f60c87dd558> object and it has some attributes like follows:

['__bool__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', 'get', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 'update', 'values']

As update attribute in list, why I can't update rcd2.samples dict with this command rcd2.samples.update(rcd1.samples).

If this is unavailable means, could you tell me how to merge samples info from two vcf file?

feilchenfeldt commented 3 months ago

I have the same issue. I would like to add additional samples to a vcf using pysam. Is there any way to do this?

I see a TODO item in the source of pysam (pysam.libcbcf.VariantRecordSamples, below) suggesting that VariantRecordSamples are indeed read only despite having an .update method.

class VariantRecordSamples(_Mapping[Union[str, int], "VariantRecordSample"]):
    def __eq__(self, other) -> bool: ...
    def __ne__(self, other) -> bool: ...
    # TODO Do these work? Isn’t the container read only?
    def update(
        self,

Is there any workaround for this?

Here some mock code for what I would like to achieve.

in_vcf = pysam.VariantFile('in.vcf')
header1 = in_vcf.header.copy()
header1.add_sample('newsample')
out_vcf = pysam.VariantFile('out.vcf', 'w', header=header1)

for rec in in_vcf:
    s = rec.samples[some_existing_sample]
    new_s = copy.copy(s)
    #just adding het for the sake of example
    #in reality there is a complex function determining the GT 
    new_s['GT'] = (0, 1)
    #The following throws the Error:
    #TypeError: 'pysam.libcbcf.VariantRecordSamples' object does not support item assignment
    rec.samples.update({'newsample':new_s}) 
    out_vcf.write(rec)
out_vcf.close()

Thanks!