project-gemmi / gemmi

macromolecular crystallography library and utilities
https://project-gemmi.github.io/
Mozilla Public License 2.0
224 stars 45 forks source link

support for multithreading in python? #91

Closed lpravda closed 1 month ago

lpravda commented 3 years ago

Hi @wojdyr,

I wonder if there is any plan on considering supporting multithreading in python. Presently, when I try to use Python's multiprocessing module it crashes with message similar to:

TypeError: can't pickle gemmi.ResidueSpan objects

I triad approach described here to add custom pickling to the class shown here: , but only managed to make another step before it crashed with

TypeError: gemmi.ResidueSpan: No constructor defined!

So I'm not sure if this is something that needs to be supportd by the library, or if I'm doing something wrong. Any help would be greatly appreciated!

edit: I've just noticed #13, but have not seen any comments there...

wojdyr commented 3 years ago

yes, there was also #85

In general it's not planned. Support for pickling and unpickling everything would be a significant work and then a lot of extra code to maintain. But if it's needed only for a small class, like in #85, then yes, it can be added.

I suppose that you tried multithreading because something was too slow? What was it?

lpravda commented 3 years ago

It wasnt slow as in it would take forever. Its just that for processing of hundreds of entries the same way my natural go to mode is multiprocessing and I was facing this issue. Like I said, this is not limiting ATM. Perhaps it would be good to clarify someplace in documentation that this is not possible. So that people are not asking all over again for this feature :).

Thank you!

wojdyr commented 3 years ago

It's possible if you don't pass gemmi objects between processes. Here is an example:

import multiprocessing as mp
import sys
import gemmi

def f(path):
    st = gemmi.read_structure(path)
    weight = st[0].calculate_mass()
    return (st.name, weight)

def main():
    top_dir = sys.argv[1]
    with mp.Pool(processes=4) as pool:
        it = pool.imap_unordered(f, gemmi.CoorFileWalk(top_dir))
        for (name, weight) in it:
            print(name, weight)

main()
wojdyr commented 1 month ago

FTR, pickling was more recently discussed in #258 and was implemented for some classes.