thomas0809 / MolScribe

Robust Molecular Structure Recognition with Image-to-Graph Generation
MIT License
153 stars 31 forks source link

Better performance when not using multiprocessing #14

Closed tobbber closed 7 months ago

tobbber commented 10 months ago

First off, thanks for the project. I had some issues when using Molscribe with MLServer due to Molscribe using multiprocessing in function molscribe.chemistry.convert_graph_to_smiles(...). (This seems to cause issues because Mlserver also runs Molscribe within a multiprocessing process)

However when i forked Molscribe and disabled multiprocessing, i noticed that molscribe.predict_images ran faster than with multiprocessing enabled. For small image batches, this difference was especially significant. I was able to reproduce this in a colab notebook.

Even though these results might change for very large batch sizes (>128 images), it think it would make sense to make the use of multiprocessing optional and configurable.

This could be done by:

Feel free to give your feedback to this idea. I would also be happy to provide a pull request.

thomas0809 commented 10 months ago

Sorry for the late reply. It sounds like a great idea. I will take a look at your PR. Thanks for your effort!

Regarding the multiprocessing issue, I guess the reason is the incompatibility with jupyter notebook or Colab. I believe there could be a fix for that, in case you are interested. Anyway having the option to disable multiprocessing is great.

tobbber commented 10 months ago

Thanks for taking a look at the PR. Yes i think having the option to disable multiprocessing would be nice, at least for the standard interface Molscribe.predict_images. However, you probably have better judgment on whether multiprocessing should be disabled by default or how to enable/disable it, so feel free to suggest changes.