#2: Request for exact steps to run each of the 5 models and required packages with versions

webservicereco commented 4 years ago

Following up from 4 days ago (Dec 14, 2019) issue...No luck for me. Instructions starting "Benchmarking your models" and below are actually not clear to me. Exactly what needs to be done step by step to run each of the 5 models? I would appreciate knowing the steps. For example, what exactly are the path --gen_path for the aae, charRNN, organ, latentgan and vae models under Training, Generation and Evaluation?. What special needs to be done for latentgan, e.g. re. ddc_pub v3, molvengen, ...? How exactly to execute sh scripts/run_all_models.sh? I am using Python 3.7.3, Jupyter Notebook 6.0.2, and have installed rdkit 2019.03.4.0, and molsets 0.2 and many other packages? Is there an exact requirements list of packages required to run MOSES?
Will appreciate your clarifications. Thanks,

danpol commented 4 years ago

Hi, @webservicereco! Thank you for raising this question. @SeemonJ, could you please comment on this?

SeemonJ commented 4 years ago

Hi @webservicereco, I first want to apologize for the delay in getting back to you, as I have felt that I could not provide you with a satisfying answer.

For the sake of transparency I will preface this with the statement that the problem has yet to be fully solved. Together with the developer of ddc_pub, as the latest update to include features with his tensorflow 2.0 created combability issues with the running of the latentgan. This was done as support for tensorflow 1.12 (the used backend of ddc_pub) was removed from pip.

We have created a 'legacy branch' titled moses to still work with the latentgan, which should mirror the reference package that I am using in my own latentgan environment.

I have further created a shell script (currently available on a fork: https://github.com/SeemonJ/moses) for installing a fresh environment, with all packages that should be needed to run the model (rdkit, tensorflow 1.12, python 3.6 [for the tensorflow version], molvecgen,ddc_pub), and this installation does make the code compile and run.

However, I did in the process discover that the environment encounters a CUDA runtime error in combination with the drivers of my development computer (using Nvidia 430.50 driver). The reference graphics card from when the latentgan was originally made used 390.x and cuda 9.0 (which supposedly conflicts with PyTorch 1.1 of MOSES according to Nvidia, but no runtime error was ever experienced during development despite this). The same reference environment was today confirmed to work with Nvidia 418.87.01 drivers.

I want to make it clear that at this point, the only explanation for my own current error I can find is the combability with cuda 9.2 toolkit graphics card (supposedly required) in combination with my development drivers, and that it is fully possible that this environment can run with your own personal drivers.

I will continue to work on experimenting if another configuration of cudatoolkit will be able to run everything smoothly.

In the meantime, can you try using the install_moses.sh script on the link above and see if it works for you? I need more data of different graphic card drivers to make a general solution.

My sincerest apologies, and have a Merry Christmas,

Simon, responsible for the LatentGAN.

SeemonJ commented 4 years ago

P.S. If you want to test if the setup works with latentgan in particular, please launch 'run.py' with the argument --model latentgan This should save you some time.

magnus-the-collectioner commented 4 years ago

Hi @SeemonJ,

Installation works for me. (Nvidia P100, CUDA10, Driver was downgraded to 410) I have to mention that LatentGAN also works, but uses only 1GB of 16GB available GPU memory. (Oscilates near 7-10% of GPU usage), any advices on that topic?

SeemonJ commented 4 years ago

Hi @NihilSherrKhaine, The GPU memory usage is probably due to the low batch size, which for the LatentGAN has not been thoroughly explored (the default setting is to reproduce the benchmark results). It is fully possible that another batch size has a better output result (that the memory would be better utilized with a bigger batch is certain).

As for the GPU processing usage, this has probably more to do with an architectural clash with MOSES; the LatentGAN should be viewed as a transfer learning process, where the heteroencoder learns the data set and the GAN focuses on optimizing onto a certain area of the encoded space (in the paper from ChEMBL to active compounds towards a single target). The encoded latent vectors are considerably heavier than their SMILES counterpart, and thus the training set for the GAN part is intended to be smaller than the original dataset.

However, for MOSES, we are not moving from one dataset towards a smaller one, but rather feeding the same dataset to both models. The GPU usage is likely low because most of the runtime is spent on the CPU processing between the forward passes (as the GPU part of the model is relatively small).

I could try, when time allows, to investigate whether there is room for an architectural improvement to optimize the usage, but in this implementation I focused on making the process similar to the running of the other models, such that it could be used launched using the "run.py" script with no extra hassle, while also producing results which would match the behavior of the reference original LatentGAN.

danpol commented 4 years ago

Hi, @webservicereco! If you have any follow-up questions, feel free to reopen the issue. Hope that @SeemonJ's answer helped

molecularsets / moses

#2: Request for exact steps to run each of the 5 models and required packages with versions #66