Closed gwaybio closed 6 years ago
Ah interesting case. You've identified the problematic line. The main difficulty is that we use case to determine what's an metanode (UPPER) and what's and edge (lower). Perhaps we could allow digits proceeding an initial letter to set the case? That would fix your issue and also allow G1Fp1C2, although the readability of such an abbreviation is poor.
Looks like this works for my use case (and others)
>>> import regex
>>> pattern = regex.compile('(?<=^|[a-z<>])[A-Z0-9]+[a-z<>]+[A-Z0-9]+')
>>> pattern.findall('GiGpBP', overlapped=True)
['GiG', 'GpBP']
>>> pattern.findall('GpC1', overlapped=True)
['GpC1']
>>> pattern.findall('G1Fp1C2pC1', overlapped=True)
['G1Fp1C2', '1C2pC1']
Do you think this would introduce any unintended consequences? I can work around this issue by extracting metapaths by other means
I have been building hetnets for MSigDB at https://github.com/greenelab/interpret-compression and have been using MSigDB collection names for hetio metagraph IDs.
However, I am receiving an error in the function
metapath_from_abbrev
. For example, when callinggraph.metagraph.metapath_from_abbrev('GpC1')
The traceback is:
It appears the root of the error is in in line 332 to the method
metaedges_from_metapath
:Which also appears to break in line 114 in
hetio/abbreviation.py
.I have not looked in the specific line in too much detail but I was wondering if there would be a way to accept integers in metapath abbreviations.