kaist-amsg / LocalRetro

Retrosynthesis prediction for organic molecules with LocalRetro
81 stars 24 forks source link

DGL module error #15

Closed biotech7 closed 1 year ago

biotech7 commented 1 year ago

Hi LocalRetro Team! when training, DGL module exception thrown as follows:

Loading previously saved dgl graphs...
Traceback (most recent call last):
  File "Train.py", line 95, in <module>
    main(args)
  File "Train.py", line 63, in main
    run_a_train_epoch(args, epoch, model, train_loader, loss_criterion, optimizer)
  File "Train.py", line 19, in run_a_train_epoch
    atom_logits, bond_logits, _ = predict(args, model, bg)
  File "/root/autodl-tmp/LocalRetro/scripts/utils.py", line 163, in predict
    return model(bg, node_feats, edge_feats)
  File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/autodl-tmp/LocalRetro/scripts/models.py", line 48, in forward
    bond_feats = self.linearB(pair_atom_feats(g, node_feats))
  File "/root/autodl-tmp/LocalRetro/scripts/model_utils.py", line 13, in pair_atom_feats
    atom_pair_list = torch.transpose(sg.adjacency_matrix().coalesce().indices(), 0, 1)
  File "/root/miniconda3/lib/python3.8/site-packages/dgl/sparse/sparse_matrix.py", line 461, in coalesce
    return SparseMatrix(self.c_sparse_matrix.coalesce())
RuntimeError: expected scalar type Long but found Int

to reproduce: python: 3.8 OS: ubuntu 20.4 cuda: 11.3(on RTX 4090) pytorch: 1.12.0 +cu113 DGL: 1.1.1+cu113(https://data.dgl.ai/wheels/cu113/repo.html) rdkit: 2023.3 numpy: 1.22.4 DGLLife: 0.3.2 i can hardly figure out what's wrong? can you help me?

biotech7 commented 1 year ago

It's most likely an incompatibility between _losscriterion and a newer version of DGL.

shuan4638 commented 1 year ago

Hi,

Thanks for bringing this up. To solve this problem, you need to understand what the function atom_pair_list = torch.transpose(sg.adjacency_matrix().coalesce().indices(), 0, 1) does. This function basically pair all the connecting atoms according to the sg.adjacency_matrix().

Now, the error is likely occuring due to the version of DGL not supporting the current data type. According to the error message, you may want to try changing the sg.adjacency_matrix() from int to long tensor.

Since I don't have the GPU supporing cuda11.3, it is not possible for me to debug. Please let me know if you solve this issue!

biotech7 commented 1 year ago

@shuan4638 By tracing into the _adjacency_matrix() method, it was found that the edge type (abbreviated as etype) marked as int_ . But i have no idea how and where to modify it in a proper manner due to its rigorous algorithm. Anyway. it's not the best choice to modify modules' codes. Is there any optional strategy for this bug rectification?

would you mind introducing the full-matched CUDA/Pytorch/DGL version to reproduce the LocalRetro project? I have tried many CUDA/Pytorch/DGL matched version for this project as a consequence of all with failure(DGL module exceptions).

BTW, when predicting products from given reactants, does this retrosynthesis model produce/provide only one synthesis route or multiple ones?

Finally, thanks for your great works!

shuan4638 commented 1 year ago

My dgl version is 0.6.1, a much earlier version than yours.Since most of the users might use later version of dgl, I will update the code running for dgl version > 1.0.0.

Will get back to you after fixing.

Thanks!

biotech7 commented 1 year ago

Thanks for your dedication to this project! A wide range of compatibility of DGL with various CUDA/Pytorch is preferred!

shuan4638 commented 1 year ago

I have just updated the code, let me know if it works or not.

biotech7 commented 1 year ago

Hi @shuan4638 I retrained a new model based on _USPTO50k and made a success ( CUDA 11.8 with matched Pytorch & DGL, python 3.8) with a single RTX 4090 on Ubuntu 20.4. localRetro_2

There is a wierd exception when training model: pd.readcsv() could not find atom_templates.csv. but there is no bugs from get_configure() method in utils.py(note:_ On Windows, there is no exception).

As to point of retrosynthesis test, there may be a amendation for cpu-only pytorch loading: in Retrosynthesis.py

args['device'] = torch.device('cuda:0') -----> 

args['device'] = torch.device('cpu')
if torch.cuda.is_available():
  args['device'] = torch.device('cuda:0')

in utils.py

torch.load(args['model_path'])------>
torch.load(args['model_path'], map_location=args['device'])

The template-based model works smoothly with high speed and accuracy(thanks to WeaveAtomFeaturizer as well as attention empowered MPNNGNN). It's a great work!

If reagents/reaction conditions were added to training and prediction process in the near future, it would be even more perfect.

shuan4638 commented 1 year ago

Thanks for the comments! I have fixed them except for the weird exception you mentioned, I am not clear which exception you are talking about. Can you describe it in more detail?

The prediction of reagent/reaction conditions is a very challenging problem because there are many factors that can affect the use of reagents and reaction conditions, such as laboratory preference, not just depending on the reaction yield. For example, you can see the same type of reactions accompanied by various different reagents. Plus, we found many of the reagent information in the USPTO dataset is missing or wrong, which is not a proper resource for training an ML model.

You might want to read this paper if you want to know more.

biotech7 commented 1 year ago

This paper is valuable!

As a laboratory synthetic researcher, I'm sure that most of reactions are reagents/conditions-specified. Reagents and reaction conditions play a key role in reaction processes. so coming the requests: how to train a model on datasets accompanied with reagents and conditions which could be reversed(Alternatively, searched) as structured/indexed labels in prediction process? Do you have any idea?

There may be not a bug but weird. I granted R/W privilege to this project folder and made sure all csv files were available.But when running _python Train.py -d USPTO50K, exception thrown as follows:

Traceback (most recent call last):
  File "Train.py", line 95, in <module>
    main(args)
  File "Train.py", line 61, in main
    train_loader, val_loader, test_loader = load_dataloader(args)
  File "/root/autodl-tmp/LocalRetro/scripts/utils.py", line 55, in load_dataloader
    dataset = USPTODataset(args, 
  File "/root/autodl-tmp/LocalRetro/scripts/dataset.py", line 22, in __init__
    df = pd.read_csv('%s/labeled_data.csv' % args['data_dir'])
  File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 577, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1407, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1661, in _make_engine
    self.handles = get_handle(
  File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/common.py", line 859, in get_handle
    handle = open(
FileNotFoundError: [Errno 2] No such file or directory: '../data/USPTO_50k/labeled_data.csv'

I changed the corresponding csv files to absolut path, errors stiil remained. Finally I copied all necessary csv files into scripts folder and altered the corresponding file path properly. no exception thrown.

shuan4638 commented 1 year ago

Thanks for the comments! I would say one simple way would be including the reagent/reaction condition also in the reaction template if they are specified. So when a reaction template is predicted, the reaction condition is also predicted at the same time.

Feel free to have more discussion here!

Shuan