simon-anders / htseq

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.
https://htseq.readthedocs.io/en/release_0.11.1/
GNU General Public License v3.0
122 stars 77 forks source link

TypeError: unhashable type: 'GenomicFeature' #103

Closed kpillman closed 4 years ago

kpillman commented 4 years ago

Hello,

I'm just checking whether anyone else has seen this error, as per Issue #59 I've recently switched to python 3 (currently running v3.8.5) from python 2.7 and am running some old code that now throws this error when using gene GenomicFeatures to populate a GenomicArrayOfSets. It appears to be the same error as was observed in Issue #59 but in a different context.

HTSeq version 0.12.4

Here is a minimal snippet of code that reproduces the error:


reference_genes_gtf = '~/reference/iGenomes/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf'

def features_to_genomic_array_of_sets(feature_iterator, **kwargs):
    '''By default, the features must be stranded. Can set this via kwargs 'stranded' as False.'''
    is_stranded = kwargs.get("stranded", True)

    genomic_array_of_features = HTSeq.GenomicArrayOfSets("auto", stranded=is_stranded)

    for feature in feature_iterator:
        print(feature.iv)
        genomic_array_of_features[feature.iv] += feature
    return genomic_array_of_features

genomic_array_of_features = features_to_genomic_array_of_sets(HTSeq.GFF_Reader(reference_genes_gtf))

Output

chr1:[11873,12227)/+
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-7d7cd902adb0> in <module>
     10     return genomic_array_of_features
     11 
---> 12 genomic_array_of_features = features_to_genomic_array_of_sets(HTSeq.GFF_Reader(reference_genes_gtf))

<ipython-input-3-7d7cd902adb0> in features_to_genomic_array_of_sets(feature_iterator, **kwargs)
      7     for feature in feature_iterator:
      8         print(feature.iv)
----> 9         genomic_array_of_features[feature.iv] += feature
     10     return genomic_array_of_features
     11 

python3/src/HTSeq/_HTSeq.pyx in HTSeq._HTSeq.ChromVector.__iadd__()

python3/src/HTSeq/_HTSeq.pyx in HTSeq._HTSeq.ChromVector.apply()

python3/src/HTSeq/_HTSeq.pyx in HTSeq._HTSeq.ChromVector.__iadd__.addval()

TypeError: unhashable type: 'GenomicFeature'

Cheers!

kpillman commented 4 years ago

Further investigation suggests the error relates to the fact a GenomicFeature can not be used as the 'value' part of a GenomicArrayOfSets in python3.

genomic_array_of_features = HTSeq.GenomicArrayOfSets("auto", stranded=is_stranded)

for feature in feature_iterator:
        genomic_array_of_features[feature.iv] += feature
    return genomic_array_of_features

The error goes away if I use something hashable like a string instead, e.g. genomic_array_of_features[feature.iv] += "feature" although obviously this isn't a solution to my problem...

I can't figure out a work around this without breaking everything downstream (we have a lot of code relying on this particular object). Is there anything you can do internally to make GenomicArrayOfSets accept GenomicFeatures as values again?

kpillman commented 4 years ago

Just realised this issue needs to be raised in the new HTSeq repo, not here, sorry.