mobiusklein / glypy

Glycan Analysis and Glycoinformatics Library for Python
Apache License 2.0
27 stars 14 forks source link

Error when multiprocessing the Glycan object #16

Closed bobaoai closed 4 years ago

bobaoai commented 5 years ago

Hi Joshua,

It looks like the glypy changed significantly recently. The pip version is stable but when I tried to use the new version. I find I cannot multiprocess it. This problem happened a month ago I am not sure if it has been resolved. Please let me know if it is not informative enough and you need more specific bug information.

def extract_motif(name, a_glycan, return_dict, branch=5):
    """
    :param a_glycan: Glycan obj
    :param branch:
    :return:
    """
    extracted_motif_dic = {}

    for i in a_glycan.fragments(max_cleavages=branch):
        _frag_gly = fragment_to_substructure(i, a_glycan)

        if not str(len(_frag_gly)) in extracted_motif_dic.keys():
            extracted_motif_dic[str(len(_frag_gly))] = [_frag_gly]
        else:
            extracted_motif_dic[str(len(_frag_gly))].append(_frag_gly)
    extracted_motif_dic[str(len(a_glycan))] = [a_glycan]

    return_dict[name] = extracted_motif_dic

# glycan_dict is a dict: {glycan_name: Glycan}

manager = multiprocessing.Manager()
motif_dic = manager.dict()
print('start parallel parsing', len(glycan_dict), 'glycans')
pool = multiprocessing.Pool(processes=__init__.num_processors)
pool_list = []
for idex, i in enumerate(glycan_dict):
        if len(glycan_dict[i]) > gly_len:
            print(i, 'larger than max')
            continue
        """ using get motif with count wrapper
            Also check exists wrapper
        """
        pool_list.append(pool.apply_async(extract_motif, args=(i, glycan_dict[i], motif_dic)))
result_list = [xx.get() for xx in pool_list]
pool.close()
pool.join()
mobiusklein commented 5 years ago

I need to see the error message you're getting to be able guess where the problem is arising. It's possible something is no longer pickle-able, though the unit test suite should catch that.

bobaoai commented 4 years ago

Hi Josh,

Congratulation! I found the glypy is published.

I find out this problem popped up again when I updated the glypy to the latest version from pip(glypy=0.12.3).

I was trying to get the substructures from a list of glycans. I didn't get any error when I was using 0.12.1 version. After I updated to 0.12.3, I found there is one and only one glycan raising this issue. This is weird. Please let me know if you have any thoughts!

“”“File "/anaconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "/Users/apple/PycharmProjects/GlyCompare/glycompare/extract_substructures.py", line 68, in extract_substructure_wrapper substructure_dic[a_name] = extract_substructure(a_glycan_str) File "/Users/apple/PycharmProjects/GlyCompare/glycompare/extract_substructures.py", line 42, in extract_substructure for i in a_glycan.fragments(max_cleavages=branch): File "/anaconda3/lib/python3.7/site-packages/glypy/structure/glycan.py", line 1301, in fragments source = self.clone() File "/anaconda3/lib/python3.7/site-packages/glypy/structure/glycan.py", line 925, in clone clone_root = graph_clone(self.root, visited=visited) File "/anaconda3/lib/python3.7/site-packages/glypy/structure/monosaccharide.py", line 179, in graph_clone terminal = link[ref] File "/anaconda3/lib/python3.7/site-packages/glypy/structure/link.py", line 167, in to if mol is (self.parent): AttributeError: parent """

mobiusklein commented 4 years ago

Thank you, and thank you for reporting this.

I'd need to see the glycan in question, but if I had to guess, I'd bet this glycan has an ambiguous link (glypy.structure.link.AmbiguousLink) and I can't say I've pickled one of those. Running a test, I can see that it does fail, but missing a child not a parent, not that it matters because this just depends upon where the iteration starts. I'd appreciate seeing your structure too so I can write more complete tests.

mobiusklein commented 4 years ago

And sure enough, I forgot to return the state in AmbiguousLink.__getstate__, which meant it returned None which meant that __setstate__ was never called, so the object was never reinitialized on unpickle. It should be fixed on master, and I'll put out a bugfix release if you can confirm this fixes your issue.

bobaoai commented 4 years ago

Thank you for checking this issue! There are three glycans. Sure, I will check it now if it works. """ RES 1b:x-dglc-HEX-1:5 2b:x-dglc-HEX-1:5 3b:x-dman-HEX-1:5 4b:a-dman-HEX-1:5 5b:b-dglc-HEX-1:5 6s:n-acetyl 7b:a-dman-HEX-1:5 8b:b-dglc-HEX-1:5 9s:n-acetyl 10b:b-dglc-HEX-1:5 11b:x-dgal-HEX-1:5 12b:x-dgro-dgal-NON-2:6|1:a|2:keto|3:d 13s:n-acetyl 14s:n-acetyl 15s:n-acetyl 16s:n-acetyl 17b:a-lgal-HEX-1:5|6:d LIN 1:1o(-1+1)2d 2:2o(-1+1)3d 3:3o(3+1)4d 4:4o(2|4+1)5d 5:5d(2+1)6n 6:3o(6+1)7d 7:7o(2+1)8d 8:8d(2+1)9n 9:7o(6+1)10d 10:10o(-1+1)11d 11:11o(-1+2)12d 12:12d(5+1)13n 13:10d(2+1)14n 14:2d(2+1)15n 15:1d(2+1)16n 16:1o(6+1)17d """ """ RES 1b:b-dglc-HEX-1:5 2s:n-acetyl 3b:b-dglc-HEX-1:5 4s:n-acetyl 5b:b-dman-HEX-1:5 6b:a-dman-HEX-1:5 7b:b-dglc-HEX-1:5 8s:n-acetyl 9b:b-dgal-HEX-1:5 10b:a-dgro-dgal-NON-2:6|1:a|2:keto|3:d 11s:n-acetyl 12b:b-dglc-HEX-1:5 13s:n-acetyl 14b:b-dgal-HEX-1:5 15b:a-dgro-dgal-NON-2:6|1:a|2:keto|3:d 16s:n-acetyl 17b:a-dman-HEX-1:5 18b:b-dglc-HEX-1:5 19s:n-acetyl 20b:b-dgal-HEX-1:5 21b:a-dgro-dgal-NON-2:6|1:a|2:keto|3:d 22s:n-acetyl 23b:a-lgal-HEX-1:5|6:d LIN 1:1d(2+1)2n 2:1o(4+1)3d 3:3d(2+1)4n 4:3o(4+1)5d 5:5o(3|6+1)6d 6:6o(2+1)7d 7:7d(2+1)8n 8:7o(4+1)9d 9:9o(3|6+2)10d 10:10d(5+1)11n 11:6o(4|6+1)12d 12:12d(2+1)13n 13:12o(4+1)14d 14:14o(3|6+2)15d 15:15d(5+1)16n 16:5o(3|6+1)17d 17:17o(2+1)18d 18:18d(2+1)19n 19:18o(4+1)20d 20:20o(3|6+2)21d 21:21d(5+1)22n 22:1o(6+1)23d """ """ RES 1b:x-dglc-HEX-1:5 2s:n-acetyl 3b:b-dglc-HEX-1:5 4s:n-acetyl 5b:b-dman-HEX-1:5 6b:a-dman-HEX-1:5 7b:b-dglc-HEX-1:5 8s:n-acetyl 9b:b-dgal-HEX-1:5 10b:a-dgro-dgal-NON-2:6|1:a|2:keto|3:d 11s:n-acetyl 12b:b-dglc-HEX-1:5 13s:n-acetyl 14b:b-dgal-HEX-1:5 15b:a-dman-HEX-1:5 16b:b-dglc-HEX-1:5 17s:n-acetyl 18b:b-dgal-HEX-1:5 19b:b-dglc-HEX-1:5 20s:n-acetyl 21b:b-dgal-HEX-1:5 22b:a-dgro-dgal-NON-2:6|1:a|2:keto|3:d 23s:n-acetyl 24b:b-dglc-HEX-1:5 25s:n-acetyl 26b:b-dgal-HEX-1:5 27b:a-dgro-dgal-NON-2:6|1:a|2:keto|3:d 28s:n-acetyl 29b:a-lgal-HEX-1:5|6:d LIN 1:1d(2+1)2n 2:1o(4+1)3d 3:3d(2+1)4n 4:3o(4+1)5d 5:5o(3+1)6d 6:6o(2+1)7d 7:7d(2+1)8n 8:7o(4+1)9d 9:9o(3|6+2)10d 10:10d(5+1)11n 11:6o(4+1)12d 12:12d(2+1)13n 13:12o(4+1)14d 14:5o(6+1)15d 15:15o(2+1)16d 16:16d(2+1)17n 17:16o(4+1)18d 18:18o(3+1)19d 19:19d(2+1)20n 20:19o(4+1)21d 21:21o(3|6+2)22d 22:22d(5+1)23n 23:15o(6+1)24d 24:24d(2+1)25n 25:24o(4+1)26d 26:26o(3|6+2)27d 27:27d(5+1)28n 28:1o(6+1)29d """

bobaoai commented 4 years ago

I only copied the link.py and replaced it with my local link.py file. It works now. Thanks!

mobiusklein commented 4 years ago

I've uploaded v0.12.4 to PyPI, which contains the fix. Thank you.