mobiusklein / glypy

Glycan Analysis and Glycoinformatics Library for Python
Apache License 2.0
27 stars 14 forks source link

Breakdown(get the substructure of) the glycan that only has topology information, no linkage info #14

Closed bobaoai closed 5 years ago

bobaoai commented 6 years ago

Hi, I am using this code to get all possible substructures in "_glycan". I cannot get the string of the substructure.

_glycan=glycoct.loads("""RES
1b:x-dglc-HEX-1:5
2b:x-lgal-HEX-1:5|6:d
3b:x-dglc-HEX-1:5
4b:x-dman-HEX-1:5
5b:x-dman-HEX-1:5
6b:x-dglc-HEX-1:5
7b:x-dgal-HEX-1:5
8b:a-dgro-dgal-NON-2:6|1:a|2:keto|3:d
9s:n-acetyl
10s:n-acetyl
11b:x-dman-HEX-1:5
12b:x-dglc-HEX-1:5
13b:x-dgal-HEX-1:5
14s:n-acetyl
15b:x-dglc-HEX-1:5
16b:x-dgal-HEX-1:5
17s:n-acetyl
18s:n-acetyl
19s:n-acetyl
LIN
1:1o(-1+1)2d
2:1o(-1+1)3d
3:3o(-1+1)4d
4:4o(-1+1)5d
5:5o(-1+1)6d
6:6o(-1+1)7d
7:7o(3|6+2)8d
8:8d(5+1)9n
9:6d(2+1)10n
10:4o(-1+1)11d
11:11o(-1+1)12d
12:12o(-1+1)13d
13:12d(2+1)14n
14:11o(-1+1)15d
15:15o(-1+1)16d
16:15d(2+1)17n
17:3d(2+1)18n
18:1d(2+1)19n
”“”)

_frag_motif_list = {}
    for i in _glycan.fragments(max_cleavages=len(_glycan)):
        _frag_gly = fragment_to_substructure(i, _glycan)
        if not len(_frag_gly) in _frag_motif_list.keys():
            _frag_motif_list[len(_frag_gly)] = [glycoct.loads(str(_frag_gly))]
        else:
            _frag_motif_list[len(_frag_gly)].append(glycoct.loads(str(_frag_gly)))

I tried to print the node that generated the error by inserting some print() in source code: The error returns: RES 1b:x-dglc-HEX-1:5 2s:n-acetyl LIN 1:1d(2+1)2n 2 RES 1b:x-dglc-HEX-1:5 2s:n-acetyl LIN 1:1d(2+1)2n 2 tag 0 (4)o(-1+1)d(5)[x] (5)o(-1+1)d(6)[x] None

TypeError Traceback (most recent call last)

in () 25 # print('finished getmotif') 26 ---> 27 get_motif(glycoct.loads(a_3350),1) 28 # str() in get_motif(glycoct_obj, idex) 15 _frag_motif_list[len(_frag_gly)] = [glycoct.loads(str(_frag_gly))] 16 else: ---> 17 _frag_motif_list[len(_frag_gly)].append(glycoct.loads(str(_frag_gly))) 18 # except: 19 # plot(_frag_gly) ~/anaconda3/lib/python3.5/site-packages/glypy/structure/glycan.py in serialize(self, name) 673 674 def serialize(self, name='glycoct'): --> 675 return self._serializers[name](self) 676 677 __repr__ = serialize ~/anaconda3/lib/python3.5/site-packages/glypy/io/glycoct.py in dumps(structure, full) 1225 1226 def dumps(structure, full=True): -> 1227 return GlycoCTWriter(structure, None, full=full).dump() 1228 1229 ~/anaconda3/lib/python3.5/site-packages/glypy/io/glycoct.py in dump(self) 960 961 def dump(self): --> 962 buffer = self.handle_glycan() 963 if self.nobuffer: 964 value = buffer.getvalue() ~/anaconda3/lib/python3.5/site-packages/glypy/io/glycoct.py in handle_glycan(self) 1182 visited.add(link.child.id) 1183 if link.child.node_type is Monosaccharide.node_type: -> 1184 line = self.handle_monosaccharide(link.child) 1185 else: 1186 line = self.handle_substituent(link.child) ~/anaconda3/lib/python3.5/site-packages/glypy/io/glycoct.py in handle_monosaccharide(self, monosaccharide) 1144 link_collection.extend([cl for p, cl in monosaccharide.children(links=True)]) 1145 -> 1146 links = self.ordering_context.sort_links(link_collection) 1147 self.link_queue.extend(links) 1148 return residue_str ~/anaconda3/lib/python3.5/site-packages/glypy/io/glycoct.py in sort_links(self, links) 1080 1081 def sort_links(self, links): -> 1082 return sorted(links, key=cmp_to_key(self.compare_link_ordering)) 1083 1084 def sort_residues(self, residues): ~/anaconda3/lib/python3.5/site-packages/glypy/io/glycoct.py in compare_link_ordering(self, link_a, link_b) 1075 print(child_a) 1076 print(child_b) -> 1077 ordered = self.compare_residue_ordering(child_a, child_b) 1078 1079 return ordered ~/anaconda3/lib/python3.5/site-packages/glypy/io/glycoct.py in compare_residue_ordering(self, res_a, res_b) 1037 return diff_n_branches_from 1038 -> 1039 subtree_a = GlycoCTWriter(Glycan.subtree_from(self.structure, res_a)).dump() 1040 subtree_b = GlycoCTWriter(Glycan.subtree_from(self.structure, res_b)).dump() 1041 return subtree_a < subtree_b ~/anaconda3/lib/python3.5/site-packages/glypy/io/glycoct.py in dump(self) 960 961 def dump(self): --> 962 buffer = self.handle_glycan() 963 if self.nobuffer: 964 value = buffer.getvalue() ~/anaconda3/lib/python3.5/site-packages/glypy/io/glycoct.py in handle_glycan(self) 1168 visited = set() 1169 if self.structure.root.node_type is Monosaccharide.node_type: -> 1170 res_str = self.handle_monosaccharide(self.structure.root) 1171 self.buffer.write(res_str + "\n") 1172 else: ~/anaconda3/lib/python3.5/site-packages/glypy/io/glycoct.py in handle_monosaccharide(self, monosaccharide) 1144 link_collection.extend([cl for p, cl in monosaccharide.children(links=True)]) 1145 -> 1146 links = self.ordering_context.sort_links(link_collection) 1147 self.link_queue.extend(links) 1148 return residue_str ~/anaconda3/lib/python3.5/site-packages/glypy/io/glycoct.py in sort_links(self, links) 1080 1081 def sort_links(self, links): -> 1082 return sorted(links, key=cmp_to_key(self.compare_link_ordering)) 1083 1084 def sort_residues(self, residues): ~/anaconda3/lib/python3.5/site-packages/glypy/io/glycoct.py in compare_link_ordering(self, link_a, link_b) 1075 print(child_a) 1076 print(child_b) -> 1077 ordered = self.compare_residue_ordering(child_a, child_b) 1078 1079 return ordered ~/anaconda3/lib/python3.5/site-packages/glypy/io/glycoct.py in compare_residue_ordering(self, res_a, res_b) 1025 print(link) 1026 if link.is_parent(res_a): -> 1027 branch_label = self.get_branch_from_link_label(link) 1028 n_branches_from_a = max(n_branches_from_a, self.branch_to_terminal_count[branch_label]) 1029 ~/anaconda3/lib/python3.5/site-packages/glypy/io/glycoct.py in get_branch_from_link_label(self, link) 984 def get_branch_from_link_label(self, link): 985 print(link.label) --> 986 return link.label[0] 987 988 def build_branch_to_terminal_count(self): TypeError: 'NoneType' object is not subscriptable
mobiusklein commented 6 years ago

This error is caused by GlycoCT serialization code assuming that the Glycan object was indexed already. I've added a check for this and it will index the glycan automatically.

There is a method to do this subtree enumeration directly without needing to resort to fragment_to_substructure, Glycan.substructures. It uses the same machinery as Glycan.fragments, but skips the extra work that fragments does to account for the bond cleavage losses from particular types of dissociation in a mass spectrometer.