Closed hosseinhejazian closed 5 years ago
Did you start the clusters before running the notebook? e.g.: 'ipcluster start -n 4' rc.ids should output the numbers.
I just run modelEvaluation.ipynb sequentially with the following parameter: (N=20, tries=3, model_names = ['linear', 'squared_exp'], Qs=[1,2,3]). It took 5 min on a macbook air. I would also run the code with smaller N and fewer parameters without running cell 4 (without ipyparallel) first, to see whether the problem is really in the parallelization. If so, I can't help, since I am not an expert on ipyparallel (maybe it helps to read through: https://ipyparallel.readthedocs.io/en/latest/index.html).
But if you have any issues understanding the GPLVM code, feel free to ask.
I tried to reproduce the results of your code in which the runs are distributed among all CPUs with 32 clusters. But, it seems that the running time goes to infinity! For example in modelEvaluation.ipynb, after I had run the cell "Specify Parameter", I ran the next cell but after spending 8 hours, it was still running without any result, even for a smaller version of the same dataset with 9 inputs. From the printed results existing in this script, I saw you run this code for 28 inputs in about 1 h 23 min. Can you help me figure out what is my problem in running this code distributed? FYI, I run this script on a Jupyter notebook supported by Google Cloud Platform. My guess is that the problem has lied in this line:
results = [r.get() for r in async_res]
And one more thing, this problem of taking infinite running time firstly was in this line:
rc[:].push(dict(N=N, D=D, Y=Y, stan_model=stan_model, tries=tries), block=True)
in which I madeblock=False
and then the problem went to theresults = [r.get() for r in async_res]
line.