liu-congcong / MetaDecoder

An algorithm for clustering metagenomic sequences.
GNU General Public License v3.0
30 stars 2 forks source link

Threads issue during cluster step #9

Closed yeon009 closed 1 year ago

yeon009 commented 1 year ago

Hi, I am trying to use MetaDecoder for some datasets with CPU environment. I successfully finished metadecoder coverage and seed steps with options --threads but I found there is no --threads parameter for metadecoder cluster step. Proper threads adjustment is desperately required because my lab shares a working server.

So I changed the all os.cpu.count() part in metadecoder_cluster.py to number 5 and it seemed to work fine with core 5 right before the DPGMM process. It suddenly occupies 20 cores as soon as enter DPGMM process. note that the maximum core in server is 48 and available portion is 15.

I wanted to ask, could threads adjustment is possible during metadecoder cluster step? Please let me know if any method is available.

Thanks

liu-congcong commented 1 year ago

Hi yeon009, DPGMM is implemented based on numpy and scipy, and their linalg submodules are based on openblas or other related libraries. Therefore, you can try the following two solutions.

  1. export OPENBLAS_NUM_THREADS=15 && metadecoder cluster xxx
  2. change Line 334 from: dpgmm.main() to: with threadpool_limits(limits = 15): dpgmm.main()

Hope this can help you.

Best,

Cong-Cong

yeon009 commented 1 year ago

HI! I used the method with threadpool_limits(limits = 15): dpgmm.main() and it worked very well. Thank you very much for your help.