monarch-initiative / genophenocorr

Genotype Phenotype Correlation
https://monarch-initiative.github.io/genophenocorr/stable
MIT License
4 stars 1 forks source link

Do not use the variant HGVS as the name when storing long variants #26

Closed pnrobinson closed 10 months ago

pnrobinson commented 1 year ago

When creating a cohort

cc = PhenopacketCohortCreator(pc)
patientCohort = cc.create_cohort(fpath_phenopackets)

I get this error

File ~/GIT/genophenocorr/src/genophenocorr/variant/_annotators.py:180, in VariantAnnotationCache.store_annotations(self, variant_coordinates, annotation)
    178 def store_annotations(self, variant_coordinates: VariantCoordinates, annotation: Variant):
    179     fpath = self._create_file_name(variant_coordinates)
--> 180     with open(fpath, 'wb') as f:
    181         pickle.dump(annotation, f)

because of this variant name

'CACHE/17_42709934_42712286_GACCTGGAAGAGAAATCCAACGGGCCTGTCACTCCTCGAGCAAGGGGGTCAGGTAAGTGGCCCAGCTGGGTGCTGGCCTTGGGAGGGTTCTGAGAAACTCAGGCAGCTGACCAAGCCTCTCATCAGTCAGGGAGAGACAGAGTGCCACTGGAACATTGGGTTACTGGCTCTGAAGTTCATTCCTAATTATTTATCCTGACTCAGGAAAGGAGAAATACTGAGCACAGTAATACCGCCCCTGGTCAGAAGCTGTCACCTACTACTCTTTCTACCAAGCCACGGGTAGAAGAGTGGGCTGACTGTGACCAACAGTATCTTCTTCTTTTTAGGAAGGGCAACGCTGTGCCTTGTGTAACTGAGTGTAAGGCAGGACAGGACAGGACAGGAATGGTTTCAGTGGGCTAAATATTAGCTCCCTCTGTCAGTATAAAGATACCGGAGCCTCAGCCATTTCAATAGGATGTGTTTTTTCTCTTAAAGCACTGGTTTTTAGTTTTTCCTTTTCTTTGTTGGGGCTATTGGCCCTTTGTGGGGGATCTTTGAAAACTGTAACTATTCTCAGGAAAATACAGACAAGAACATTCTTGCATACAAATCCATAGATGGTTACGTTGAGAACCTGTGATCAGGGAAATAGGTATGAGCTCCAAAATGAAAGCAAAGGGCACTTCAGCTCATGGTTCTGTTTTTGTTTGTTTTTTTTTTTTTTTTTTAAGAGAGAGGGTCTCATACTCTTGGCCAGGCTGGAGTGCAGTGGTGCCATCATAGCTCAATGTAGTATAGAACTCCTGGGCTCAAGCCATCTTCCCACCTCAGCCTCCTGAGTACTAGGACTACAGGTACGTGGCTTTTTTTTTTTTTTTTTTTTTGTAGAAATGGGGTCTCACTTTGTTGCCCACACTGGTCCTGAAATCCTGGCTTCAAGCGATCCTCCCACCATGGCTTCCCAAAGCACTGGAATTCTAGGTGTGAGCCACCTTGCCCAGTCCATGGTTCTATTAATTGTTCTCAGTACAGGAAGCATGAAGAAGAGGCCACAGAGTCTCCTCCAGAAGGTAGGAAGCCAAAGCATTGGGGTTCCTTTCCTGTTGGACATGCTGGCCCTGACAGCTGCCTCCTTGTCCCTGTTCTTCAGTCTGTCTTCTCACTGTGGTCTTTTCCTGTCTTTTCCTGGGCCAATCACTTGAGGTCAGGAGTTTGAGACCAGCCTGGCCAACATGGTGAAACCCCATCTCTACTAAAAAAAAATACAAAAATGGCCAGGCACATTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGTGGATCACCTGAGGTCAGGAGTTTGAGACCAGCCTGGCCAACATGGTGAAACCCCGTCTCTATTACAAATACAAAATTAGCCGGGCGTGGTGGTGCACACCTGTAATCCCAGCTACTTGGGAGGCTGTGGCAGAAGAATCACTTGAACCTGGGAGGCGGAGGCTGCAGTGAGCTGAGATCATGCCACTGCACTCCAGCCTGGGCAACAGACCGAGACTCCATCTCAAAAAAACAAAACAAAAAAAATTAGCTGGGTGTGGTGGTGGGCACCTGTAATCCCAGTTGCTTGGGAGGATGAGGCAGAAGAATCACTTGAACTTGGGAGGCGGAGGTTGCAGTGAACCAAGATTATGCCACTGCACCACTCCAGCCTGGGCAACAGAGCGAGATTCTGTCTCAAAAAAAAAAAAAAATTAGCTGGGCATACTGGCCTGCACCTGTAGTCCCTTGCTACTTGCTTGGCTGAGGGGAGAGGACTGCTTGAGCCCAGGAGGCGGAGGTTGCAGTGAGCTATGATCATGCCACTGCACTCCAGCCTGGGCGACACAGTGAAACCCTGTCTCAAAGACAAAATAAAGATAATCTAGTGATAGAAAATGTGGAGAATAAAATGACTGAAGAGGCTGGCGGAGTGGTGGAGGGAGCAGCAGCTGCAGCAGCTGCAGCAGCAGCAGCAGTGTGCTCATTAACAAGAGCCACAGAAAGACCTGGGAGTCCCTTCTGGGAAAGGGGTACACATTTAGAAAGGAGGCCAGAGCCAAAAAAAAGAAGCGAAAGAGTGTAGGACCCAGAAGCATTAAATAGAGTCCAGACAGAAATGAGCATTCAGCAAGGAGGAGGCGGGTCCCCAAACATCATTAGGCCTGGCACTTGCAGAAGGGCCATGTTTGGGAAACTCACAGAAGCACAGGCTCATCAGGGACTGAACTTAAGACAACTTCTCTCCAGACCCAGACACACAGCCTGGTAAGATGGCAAAGGGCTGGACAGAGCAATGCGTGAAAGGAGGGGCCCATTTGTTCTGCTGCTTCCAGATGGT_G_heterozygous.pickle'

I would suggest something like this

import tempfile
if len(variant_name) > 50:
     variant_name = tempfile.TemporaryFile()

of just storing something like variant1, variant2 etc -- I do not think these names matter at all for the rest of the code in a notebook.

ielis commented 10 months ago

I think this has been addressed in #70 . The file name is now generated using variant_key for shorter sequence + symbolic variants, or using variant_class for longer sequence INDELs, as shown here.