Open amir-tagh opened 2 years ago
https://github.com/wengong-jin/hgraph2graph/issues/34 should answer your question. I had same also. Then, I did according to that tread and viola....it worked. Run this first: python preprocess.py --train data/chembl/all.txt --vocab vocab.txt --ncpu 16 --mode single
After completion,
then, this: mkdir train_processed
After, then this mv tensor* train_processed/
Thanks for your response.
I have a set of smiles which I am working on, extracting the substructures is done successfully but the second step is giving the following error:
do you have any idea what could be problem?
Thanks for your help.
python preprocess.py --train Inforna_correct_for_ML.txt --vocab inforna_vocab.txt --ncpu 16 --mode single
File "preprocess.py", line 109, in
I will suggest you adjust the number of split formular: num_splits = len(alldata) // 1000_ if your len(data) is < 1000, num_split = 0 because of the floor division. So my advice is you use a denominator that can give your num_split >= 1. maybe you use 100 or 10 or 5.
Thanks a lot for your help. Now I am at the third step "Train graph generation model" and I am getting the following error. I googled the error but couldnt find a solution.
Thanks,
here is the pytorch version I am using, if it helps:
pytorch 1.11.0 py3.7_cuda11.1_cudnn8.0.5_0 pytorch
Traceback (most recent call last):
File "train_generator.py", line 96, in
The error is due to lack of nvidia gpu that enable cuda on your machine.
but I have the nvidia gpu?
NVIDIA-SMI 470.129.06 Driver Version: 470.129.06 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Quadro P2200 Off | 00000000:65:00.0 On | N/A | | 44% 30C P5 9W / 75W | 1842MiB / 5050MiB | 68% Default | | | | N/A
Perhaps, the driver is not properly installed. Something must be wrong somewhere.!
Thanks, I finally figured out what was wrong and now it is working.
Now I have a problem with finetune_generator.py
I have used the chemprop_train on my dataset and got the following in the save_dir: args.json, fold_0, verbose.log, test_scores.csv, quiet.log
after running the finetune_generator.py I get the following error, can you please let me know how can I trace the problem.
Thanks for your help.
Traceback (most recent call last):
File "/apps/hgraph2graph/20210428/hgraph2graph/finetune_generator.py", line 124, in
Traceback (most recent call last): File "/apps/hgraph2graph/20210428/hgraph2graph/finetune_generator.py", line 124, in score_func = Chemprop(args.chemprop_model) File "/apps/hgraph2graph/20210428/hgraph2graph/finetune_generator.py", line 37, in init scaler, features_scaler = load_scalers(fname) ValueError: too many values to unpack (expected 2)
Did you solve it? I am trying to figure it out, if I get to solve it, I will push the changes to my own version of this package https://github.com/muammar/hgraph2graph
Ok, I solved it... First, your fine-tune set does not need to have any headers. It should look like this:
CC
CCO
CNOO
Then, you need to apply the following patch:
diff --git a/finetune_generator.py b/finetune_generator.py
index d406d38..995cad3 100755
--- a/finetune_generator.py
+++ b/finetune_generator.py
@@ -35,9 +35,9 @@ class Chemprop(object):
for fname in files:
if fname.endswith(".pt"):
fname = os.path.join(root, fname)
- scaler, features_scaler = load_scalers(fname)
- self.scalers.append(scaler)
- self.features_scalers.append(features_scaler)
+ # scaler, features_scaler = load_scalers(fname)
+ # self.scalers.append(scaler)
+ # self.features_scalers.append(features_scaler)
model = load_checkpoint(fname)
self.checkpoints.append(model)
@@ -164,10 +164,10 @@ if __name__ == "__main__":
[
kl_div,
loss.item(),
- wacc * 100,
- iacc * 100,
- tacc * 100,
- sacc * 100,
+ wacc.item() * 100,
+ iacc.item() * 100,
+ tacc.item() * 100,
+ sacc.item() * 100,
]
)
See: https://github.com/muammar/hgraph2graph/commit/a714e2920cbb3e43e350df8e3a20961497a8d4d7
Hi muammar,
Thanks for the solution. I am using the train_translator.py for lead optimization and I am getting the following error. ahve you seen this error before? do you know how to solve it.
Thanks,
Traceback (most recent call last):
File "/apps/hgraph2graph/20210428/hgraph2graph/train_translator.py", line 86, in
Hello,
I am following the the example for "Molecule generation pretraining procedure". first step "python get_vocab.py --ncpu 16 < data/chembl/all.txt > vocab.txt" is done with no error, but I am getting the "IndexError: tuple index out of range" for the second step python preprocess.py --train data/chembl/all.txt --vocab data/chembl/all.txt --ncpu 16 --mode single
can you please let me know what could be the problem.
Best, Amir