wengong-jin / hgraph2graph

Hierarchical Generation of Molecular Graphs using Structural Motifs
MIT License
375 stars 109 forks source link

IndexError: tuple index out of range #38

Open amir-tagh opened 2 years ago

amir-tagh commented 2 years ago

Hello,

I am following the the example for "Molecule generation pretraining procedure". first step "python get_vocab.py --ncpu 16 < data/chembl/all.txt > vocab.txt" is done with no error, but I am getting the "IndexError: tuple index out of range" for the second step python preprocess.py --train data/chembl/all.txt --vocab data/chembl/all.txt --ncpu 16 --mode single

can you please let me know what could be the problem.

Best, Amir

orubaba commented 2 years ago

https://github.com/wengong-jin/hgraph2graph/issues/34 should answer your question. I had same also. Then, I did according to that tread and viola....it worked. Run this first: python preprocess.py --train data/chembl/all.txt --vocab vocab.txt --ncpu 16 --mode single

After completion,

then, this: mkdir train_processed

After, then this mv tensor* train_processed/

amir-tagh commented 2 years ago

Thanks for your response.

I have a set of smiles which I am working on, extracting the substructures is done successfully but the second step is giving the following error:

do you have any idea what could be problem?

Thanks for your help.

python preprocess.py --train Inforna_correct_for_ML.txt --vocab inforna_vocab.txt --ncpu 16 --mode single

File "preprocess.py", line 109, in le = (len(all_data) + num_splits - 1) // num_splits ZeroDivisionError: integer division or modulo by zero

orubaba commented 2 years ago

I will suggest you adjust the number of split formular: num_splits = len(alldata) // 1000_ if your len(data) is < 1000, num_split = 0 because of the floor division. So my advice is you use a denominator that can give your num_split >= 1. maybe you use 100 or 10 or 5.

amir-tagh commented 2 years ago

Thanks a lot for your help. Now I am at the third step "Train graph generation model" and I am getting the following error. I googled the error but couldnt find a solution.

Thanks,

here is the pytorch version I am using, if it helps:

Name Version Build Channel

pytorch 1.11.0 py3.7_cuda11.1_cudnn8.0.5_0 pytorch

Traceback (most recent call last): File "train_generator.py", line 96, in meters = meters + np.array([kl_div, loss.item(), wacc 100, iacc 100, tacc 100, sacc 100]) File "/home/amir/anaconda3/envs/sampledock/lib/python3.7/site-packages/torch/_tensor.py", line 732, in array return self.numpy() TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

orubaba commented 2 years ago

The error is due to lack of nvidia gpu that enable cuda on your machine.

amir-tagh commented 2 years ago

but I have the nvidia gpu?

NVIDIA-SMI 470.129.06 Driver Version: 470.129.06 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Quadro P2200 Off | 00000000:65:00.0 On | N/A | | 44% 30C P5 9W / 75W | 1842MiB / 5050MiB | 68% Default | | | | N/A

orubaba commented 2 years ago

Perhaps, the driver is not properly installed. Something must be wrong somewhere.!

orubaba commented 2 years ago

maybe this can help: https://github.com/wengong-jin/hgraph2graph/pull/35/commits/2e56392b747d3af1b3ed51086a01e857e03deb97

amir-tagh commented 2 years ago

Thanks, I finally figured out what was wrong and now it is working.

Now I have a problem with finetune_generator.py

I have used the chemprop_train on my dataset and got the following in the save_dir: args.json, fold_0, verbose.log, test_scores.csv, quiet.log

after running the finetune_generator.py I get the following error, can you please let me know how can I trace the problem.

Thanks for your help.


Traceback (most recent call last): File "/apps/hgraph2graph/20210428/hgraph2graph/finetune_generator.py", line 124, in score_func = Chemprop(args.chemprop_model) File "/apps/hgraph2graph/20210428/hgraph2graph/finetune_generator.py", line 37, in init scaler, features_scaler = load_scalers(fname) ValueError: too many values to unpack (expected 2)

muammar commented 2 years ago

Traceback (most recent call last): File "/apps/hgraph2graph/20210428/hgraph2graph/finetune_generator.py", line 124, in score_func = Chemprop(args.chemprop_model) File "/apps/hgraph2graph/20210428/hgraph2graph/finetune_generator.py", line 37, in init scaler, features_scaler = load_scalers(fname) ValueError: too many values to unpack (expected 2)

Did you solve it? I am trying to figure it out, if I get to solve it, I will push the changes to my own version of this package https://github.com/muammar/hgraph2graph

muammar commented 2 years ago

Ok, I solved it... First, your fine-tune set does not need to have any headers. It should look like this:

CC
CCO
CNOO

Then, you need to apply the following patch:

diff --git a/finetune_generator.py b/finetune_generator.py
index d406d38..995cad3 100755
--- a/finetune_generator.py
+++ b/finetune_generator.py
@@ -35,9 +35,9 @@ class Chemprop(object):
             for fname in files:
                 if fname.endswith(".pt"):
                     fname = os.path.join(root, fname)
-                    scaler, features_scaler = load_scalers(fname)
-                    self.scalers.append(scaler)
-                    self.features_scalers.append(features_scaler)
+                    # scaler, features_scaler = load_scalers(fname)
+                    # self.scalers.append(scaler)
+                    # self.features_scalers.append(features_scaler)
                     model = load_checkpoint(fname)
                     self.checkpoints.append(model)

@@ -164,10 +164,10 @@ if __name__ == "__main__":
                     [
                         kl_div,
                         loss.item(),
-                        wacc * 100,
-                        iacc * 100,
-                        tacc * 100,
-                        sacc * 100,
+                        wacc.item() * 100,
+                        iacc.item() * 100,
+                        tacc.item() * 100,
+                        sacc.item() * 100,
                     ]
                 )

See: https://github.com/muammar/hgraph2graph/commit/a714e2920cbb3e43e350df8e3a20961497a8d4d7

amir-tagh commented 2 years ago

Hi muammar,

Thanks for the solution. I am using the train_translator.py for lead optimization and I am getting the following error. ahve you seen this error before? do you know how to solve it.

Thanks,

Traceback (most recent call last): File "/apps/hgraph2graph/20210428/hgraph2graph/train_translator.py", line 86, in loss, kl_div, wacc, iacc, tacc, sacc = model(batch) File "/apps/hgraph2graph/20210428/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, **kwargs) TypeError: forward() missing 2 required positional arguments: 'y_orders' and 'beta'