rsnirwan / GPLVMsInFinance

Applications of Gaussian Process Latent Variable Models in Finance
11 stars 7 forks source link

Parallel computing using ipcluster #1

Closed hosseinhejazian closed 5 years ago

hosseinhejazian commented 5 years ago

I tried to reproduce the results of your code in which the runs are distributed among all CPUs with 32 clusters. But, it seems that the running time goes to infinity! For example in modelEvaluation.ipynb, after I had run the cell "Specify Parameter", I ran the next cell but after spending 8 hours, it was still running without any result, even for a smaller version of the same dataset with 9 inputs. From the printed results existing in this script, I saw you run this code for 28 inputs in about 1 h 23 min. Can you help me figure out what is my problem in running this code distributed? FYI, I run this script on a Jupyter notebook supported by Google Cloud Platform. My guess is that the problem has lied in this line: results = [r.get() for r in async_res]

And one more thing, this problem of taking infinite running time firstly was in this line: rc[:].push(dict(N=N, D=D, Y=Y, stan_model=stan_model, tries=tries), block=True) in which I made block=False and then the problem went to the results = [r.get() for r in async_res] line.

rsnirwan commented 5 years ago

Did you start the clusters before running the notebook? e.g.: 'ipcluster start -n 4' rc.ids should output the numbers.

I just run modelEvaluation.ipynb sequentially with the following parameter: (N=20, tries=3, model_names = ['linear', 'squared_exp'], Qs=[1,2,3]). It took 5 min on a macbook air. I would also run the code with smaller N and fewer parameters without running cell 4 (without ipyparallel) first, to see whether the problem is really in the parallelization. If so, I can't help, since I am not an expert on ipyparallel (maybe it helps to read through: https://ipyparallel.readthedocs.io/en/latest/index.html).

But if you have any issues understanding the GPLVM code, feel free to ask.