hetio / hetnetpy

Hetnets in Python (relocated from dhimmel/hetio)
https://het.io/software
Other
92 stars 28 forks source link

hetio.abbreviation.metaedges_from_metapath breaks for integers #12

Closed gwaybio closed 6 years ago

gwaybio commented 6 years ago

I have been building hetnets for MSigDB at https://github.com/greenelab/interpret-compression and have been using MSigDB collection names for hetio metagraph IDs.

However, I am receiving an error in the function metapath_from_abbrev. For example, when calling graph.metagraph.metapath_from_abbrev('GpC1')

The traceback is:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-2cbc2580bcf8> in <module>()
      1 DWPCs = collections.OrderedDict()
      2 for name, graph in hetnets.items():
----> 3     metapath = graph.metagraph.metapath_from_abbrev('GpC1')
      4     rows, cols, dwpc_matrix, seconds = dwpc(graph, metapath, damping=0.4)
      5     DWPCs[name] = dwpc_matrix

~/anaconda3/envs/interpret-compression/lib/python3.6/site-packages/hetio/hetnet.py in metapath_from_abbrev(self, abbrev)
    333         for metaedge_abbrev in metaedge_abbrevs:
    334             metaedge_id = hetio.abbreviation.metaedge_id_from_abbreviation(
--> 335                 self, metaedge_abbrev)
    336             metaedges.append(self.get_edge(metaedge_id))
    337         return self.get_metapath(tuple(metaedges))

~/anaconda3/envs/interpret-compression/lib/python3.6/site-packages/hetio/abbreviation.py in metaedge_id_from_abbreviation(metagraph, abbreviation)
    128     abbrev_to_kind = {v: k for k, v in metagraph.kind_to_abbrev.items()}
    129     source_kind = abbrev_to_kind[source_abbrev]
--> 130     target_kind = abbrev_to_kind[target_abbrev]
    131     metanode = metagraph.get_node(source_kind)
    132     for edge in metanode.edges:

KeyError: 'C'

It appears the root of the error is in in line 332 to the method metaedges_from_metapath:

hetio.abbreviation.metaedges_from_metapath('GpC1')
>>> ['GpC']

Which also appears to break in line 114 in hetio/abbreviation.py.

I have not looked in the specific line in too much detail but I was wondering if there would be a way to accept integers in metapath abbreviations.

pattern = regex.compile('(?<=^|[a-z<>])[A-Z]+[a-z<>]+[A-Z]+')
dhimmel commented 6 years ago

Ah interesting case. You've identified the problematic line. The main difficulty is that we use case to determine what's an metanode (UPPER) and what's and edge (lower). Perhaps we could allow digits proceeding an initial letter to set the case? That would fix your issue and also allow G1Fp1C2, although the readability of such an abbreviation is poor.

gwaybio commented 6 years ago

Looks like this works for my use case (and others)

>>> import regex
>>> pattern = regex.compile('(?<=^|[a-z<>])[A-Z0-9]+[a-z<>]+[A-Z0-9]+')
>>> pattern.findall('GiGpBP', overlapped=True)
['GiG', 'GpBP']
>>> pattern.findall('GpC1', overlapped=True)
['GpC1']
>>> pattern.findall('G1Fp1C2pC1', overlapped=True)
['G1Fp1C2', '1C2pC1']

Do you think this would introduce any unintended consequences? I can work around this issue by extracting metapaths by other means