nyu-dl / dl4chem-mgm

BSD 3-Clause "New" or "Revised" License
69 stars 10 forks source link

Could not generate new molecules #7

Closed zby123 closed 2 years ago

zby123 commented 2 years ago

I tried to run the generation on ChEMBL using the given pretrained model and generate script. But I found the output results smiles_1_0 to smiles_300_0 are all the same. Also I tried to change the mask fraction to avoid this but I checked the code and found nowhere using the parameter --node_target_frac and --edge_target_frac. Could you please explain me that how these parameters work?

omarnmahmood commented 2 years ago

Hi, could you post the command you ran to carry out generation? --node_target_frac and --edge_target_frac allow you to specify the fraction of all nodes and edges respectively for which you want the model to compute the loss during training (the node/edge features in question can be masked out, replaced with a random feature value or left as-is for reconstruction). During generation, this is the fraction of nodes/edges that will be replaced at each iteration of Gibbs sampling. Further details on this are available in the paper. These arguments are found in train_script_parser.py , they are part of the argparse parser used for training and also the argparse parser used for generation.

zby123 commented 2 years ago

I simply run the generation script of ChEMBL in the README file. And since I do not need the similarity numbers, I delete line 171 to 181 of src/model/graph_generator.py. That's the only modification of the code. And I have no CUDA 10.0 environment so I use torch 1.14.

omarnmahmood commented 2 years ago

This may have been related to the way the random seed is used. I have changed the default value of the seed so that a fixed seed is not used. Could you pull the latest code, rerun the same command as earlier and check if the issue persists?

zby123 commented 2 years ago

Now it works. Thanks for your help.