liamlio / MolGAN

AI for a cure, a combination of Latent-GAN and VAE-JTNN to create 100% valid drug like molecules
9 stars 3 forks source link

Training for different molecule #2

Open QQuueennttiinn opened 1 year ago

QQuueennttiinn commented 1 year ago

Hi, I'm in a research project and I try to adapt your model for generating molecule that looks like a specific Data Set of molecule that I have. The difference is that my dataset has 183 molecules and 479 features, I still have a problem for the train part of the VAE, how can I adapt the pretrain.py code in my case? How to choose the hidden, the depth, and the batch ? (The latent being the number of features, 479 for me)

liamlio commented 1 year ago

Well, I haven't looked at this repo in 3 years. Let me see what I can remember and if I can answer your question.

Also, I should clarify, there's no promise that this model will actually work as I hit an issue with the pytorch dependency only working on linux machines. So there's an equal chance that this model will not work.

You might also want to consider whether you could adapt your dataset to a newer model like diffusion using the JTNN dataset.

Alright, so looking into the jupyter notebook: JTNN+LatenGAN_train.ipynb or line 37 in this file: https://github.com/liamlio/MolGAN/blob/master/trainer/TrainRunner.py

I think you just need to change this line: latent_space_mols = latent_space_mols.reshape(latent_space_mols.shape[0], 56) from 56 to 186. I haven't double checked the smiles dataset, but based on the code this is likely the place where you set the number of molecules.

As for the 479 features, just set the data_shape to 479 in the discriminator model: models/Discriminator.py

QQuueennttiinn commented 1 year ago

I managed to train JTNN+LatenGAN_train.ipynb so I managed to generate my latent data but now I have to decode them to get my smiles. But it seems to me that I also have to train and adapt this decoding part to my dataset because I have the error below. so I have to run MolGAN-master\jicml18_jtnn\molvae\pretrain.py but I don't know how to set it correctly: python MolGAN-master\jicml18_jtnn\molvae\pretrain.py --train MolGAN-master\jicml18_jtnn\data\zinc\trainDataStage.txt --vocab MolGAN-master\jicml18_jtnn\data\zinc\vocab.txt \--hidden 450 --depth 3 --latent 56 --batch 50 \--save_dir pre_model/.

the error : PS C:\Users\Quentin\Downloads\MolGAN-master> python MolGAN-master\decoder.py C:\Users\Quentin\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\torch\nn_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead. warnings.warn(warning.format(ret)) Traceback (most recent call last): ... File "C:\Users\Quentin\Downloads\MolGAN-master\MolGAN-master\decoder.py", line 44, in load_model model1.load_state_dict(torch.load(opts.model_path, map_location=torch.device('cpu'))) File "C:\Users\Quentin\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( size mismatch for decoder.W.weight: copying a param with shape torch.Size([450, 478]) from checkpoint, the shape in current model is torch.Size([450, 689]). ... size mismatch for G_var.bias: copying a param with shape torch.Size([28]) from checkpoint, the shape in current model is torch.Size([239]).

liamlio commented 1 year ago

I don't recognize the pretrain.py file you're using so I'm not sure what the issue is. From what I can tell you're trying to pass the wrong shape to the decoder. So, you can either re-train the decoder so it matches the shapes you want or just train the encoder to output the shape [450, 689].

This is the best I can offer without seeing the file.