zeroknowledgediscovery / quasinet

Machine Learning For Quasi-species
https://zeroknowledgediscovery.github.io/quasinet
GNU Lesser General Public License v2.1
5 stars 2 forks source link

Parallelize q-distance computation #20

Open JinLi711 opened 3 years ago

JinLi711 commented 3 years ago

Parallelize the q-distance computation to use multiple cores. Make sure that the overhead is not large. We can try to put every other row in the q-distance matrix to one core. For example, if we have two cores, then we can put rows 0, 2, 4, etc to CPU 0 and rows 1, 3, 5, etc to CPU 1.

JinLi711 commented 3 years ago

However, it's not a good idea to use multiple processes because every time a new process is created, there is a huge overhead in copying data over to multiple CPUs. Ideally, we would use multiple threads but this is not simple to do with python GILs. We can go around this issue by writing the code in native C code or use Numba but that would require a lot of code rewriting.