Some suggestions (multithreading capabilities and GPU integration)

mortazavilab / PyWGCNA

PyWGCNA is a Python package designed to do Weighted Gene Correlation Network analysis (WGCNA)

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btad415/7218311

MIT License

215 stars 51 forks source link

Some suggestions (multithreading capabilities and GPU integration) #118

Open HemlockPoison opened 1 month ago

HemlockPoison commented 1 month ago

Thank you for developing this project. However, I encountered a significant issue with the documentation. While there is online documentation available, I believe it could be more user-friendly. The quick start tutorial is helpful, but the web documentation lacks detailed information about the parameters for each function and explanations of what each function does. Undoubtedly, this project is very impressive and noteworthy. The addition of multithreading capabilities and GPU integration could significantly enhance its performance, especially during large data calculations. I hope these features will be considered in the future.

nargesr commented 1 month ago

Hi @HemlockPoison

Thank you for your feedback. I may not understand your suggestion completely but wanted to highlight that each function in the API includes detailed documentation, so not sure what you mean by "the web documentation lacks detailed information about the parameters for each function and explanations of what each function does."

Additionally, the functions that were used to find the modules are primarily identical to the R version, including function names, so you may refer to the original paper for more information on the methods.

As the sole developer, I am doing my best to maintain the package. I'll keep your suggestions, such as adding multithreading capabilities and GPU integration, in mind for the future. I also welcome anyone interested in contributing to these enhancements.

tuanpham96 commented 2 days ago

@nargesr I'm new to wgcna and trying this out. We have a roughly large dataset ~100k cells x 17k genes and I think it would benefit from some parallelism. Do you know which step during findModules may benefit the most from some parallelism? Would it be during calculating adjacency with corrcoef?

nargesr commented 1 day ago

Hi @tuanpham96,

Thank you so much for expressing your interest in PyWGCNA. I would suggest you look at the R version of WGCNA to fully understand the method.

There are three functions during findModules() step that would benefit from doing some parallelism

I believe TOMsimilarity() is the most time-consuming part.

Please let me know if you have any questions.

Best, Narges