Closed JamesALumley closed 3 years ago
Hi,
When you have --num_decode 10000, the code will create a large batch of batch_size 10000. It's hard for the model to decode 10000 molecules in one batch. Therefore you need to split 10000 decoding attempts into several batches.
Memory usage is high when generating large numbers of output smiles from a single input. Perhaps the intent is to only generate a few output smiles based on a single input? But when generating large numbers of smiles from a single input, the code has significant usability issues as per below error (reliance on memory bound data structure). In the below case the model was trained on ~20K mols -> 100K pairs and theoretically should have enough diversity to generate a large number of changes to the input smiles:
Traceback (most recent call last): File "../hgraph2graph/decode.py", line 69, in
new_mols = model.translate(batch[1], args.num_decode, args.enum_root, args.greedy)
File "/hpc/scratch/nvme1/HeirVAE/hgraph2graph/hgraph/hgnn.py", line 96, in translate
return self.decoder.decode( (root_vecs, z_tree_vecs, z_graph_vecs), greedy=greedy)
File "/hpc/scratch/nvme1/HeirVAE/hgraph2graph/hgraph/decoder.py", line 322, in decode
hinter = HTuple( mess = self.rnn_cell.get_init_state(tree_tensors[1]) )
File "/hpc/scratch/nvme1/HeirVAE/hgraph2graph/hgraph/rnn.py", line 76, in get_init_state
c = torch.zeros(len(fmess), self.hidden_size, device=fmess.device)
RuntimeError: CUDA out of memory. Tried to allocate 2.01 GiB (GPU 0; 10.92 GiB total capacity; 8.94 GiB already allocated; 1.29 GiB free; 9.21 GiB reserved in total by PyTorch)