ucl-pond / pySuStaIn

Subtype and Stage Inference (SuStaIn) algorithm with an example using simulated data.
MIT License
112 stars 62 forks source link

Problem when defining high number of biomarkers #12

Closed simonepenna closed 3 years ago

simonepenna commented 3 years ago

Hello!

I'm trying to run PySustain with my data (1129 observations and 38 biomarkers), but maybe for the high number of biomarkers, the algorithm does not move forward (even after 10 hours) on the print "Finding ML solution to 1 cluster problem". I found, inserting some print into the code to debug it, that the "heavy" code is in AbstractSustain.py into the _find_ml(): in particular for these lines of code:

partial_iter = partial(self._find_ml_iteration, sustainData)
pool_output_list = self.pool.map(partial_iter, range(self.N_startpoints))
if ~isinstance(pool_output_list, list):
             pool_output_list = list(pool_output_list)

I think that the map is very slow: the execution hangs on "_list(pool_outputlist)". Do you have any idea how to resolve this problem ? I tried also generating simulated data (with 1129 observations and 38 biomarkes) but nothing happened.

Thank you in advance.