tiantz17 / PocketAnchor

Learning Structure-based Pocket Representations for Protein-Ligand Interaction Prediction
Apache License 2.0
29 stars 6 forks source link

TypeError: string indices must be integers #5

Open mylRalph opened 1 year ago

mylRalph commented 1 year ago

Hello, glad to see your excellent work! I encountered a TypeError: string indices must be integers when I tried to reproduce the result of protein-ligand binding affinity prediction using command python runPrediction.py --task Affinity --dataset CASF --setting newprotein --info newprotein --num_workers 0. Here was the running log I got:

[2023-08-30 21:04:39,381] [INFO] Local folder created: ./Affinity/results/PocketAnchorPrediction_task_Affinity_dataset_CASF_model_PocketAnchor_info_newprotein_6a7b1a61cd9e_cuda0_20230830_210439/
[2023-08-30 21:04:39,382] [INFO] Start prediction
[2023-08-30 21:04:39,382] [INFO] Loading data...
[2023-08-30 21:05:04,676] [INFO] Loading model...
Traceback (most recent call last):
  File "runPrediction.py", line 320, in <module>
    main()
  File "runPrediction.py", line 316, in main
    PocektAnchor.predict()
  File "runPrediction.py", line 162, in predict
    dict_collect, results = self.evaluate()
  File "runPrediction.py", line 231, in evaluate
    pred_dict = self.Model(*data_tuple)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/PocketAnchor/Affinity/src/PocketAnchor.py", line 408, in forward
    batch_site = torch.cat([torch.LongTensor([i]*anchor['num_site']) for i, anchor in enumerate(protGraphBatch.kwargs)]).to(device)
  File "/workspace/PocketAnchor/Affinity/src/PocketAnchor.py", line 408, in <listcomp>
    batch_site = torch.cat([torch.LongTensor([i]*anchor['num_site']) for i, anchor in enumerate(protGraphBatch.kwargs)]).to(device)
TypeError: string indices must be integers

The same problem happened when I ran the command python runPrediction.py --task Affinity --dataset CASF --setting original --info original --num_workers 0. However, I can successfully run the code of protein ligand binding site prediction with command python runPrediction.py --task PocketDetection --dataset COACH420 --num_workers 0.

I wonder how I can fix the problem and I would really appreciate it if I could get your help! Looking forward to your reply.

tiantz17 commented 1 year ago

Hi, thanks for your interest!

I think this TypeError was caused by a version mismatch of torch_geometric.

In the old version of torch_geometric (used here), the torch_geometric.data.Batch.from_data_list() method cannot deal with the feature with the type of dictionary, e.g., kwargs. Therefore, protGraphBatch.kwargs here is actually a list of dictionaries, that is, [{'num_site': 72, 'num_anchor': 10}, {'num_site': 89, 'num_anchor': 12}, ...]

In the new version of torch_geometric, the dictionaries in a batch list can be converted into a big dictionary with the value be the list, that is, {'num_site': [72, 89, ...], 'num_anchor': [10, 12, ...]}

You can simply modify line 408 to line 411 of PocketAnchor/Affinity/src/PocketAnchor.py as

        batch_site = torch.cat([torch.LongTensor([i]*anchor) for i, anchor in enumerate(protGraphBatch.kwargs['num_site'])]).to(device)
        batch_anch = torch.cat([torch.LongTensor([i]*anchor) for i, anchor in enumerate(protGraphBatch.kwargs['num_anchor'])]).to(device)
        batch_vert = torch.cat([torch.LongTensor([i]*group) for i, group in enumerate(compGraphBatch.kwargs['num_site'])]).to(device)
        batch_grou = torch.cat([torch.LongTensor([i]*group) for i, group in enumerate(compGraphBatch.kwargs['num_anchor'])]).to(device)

Hope this can solve the problem.

Best, Tingzhong

mylRalph commented 1 year ago

Thanks for your reply! I have already solved the problem with your help!

mylRalph commented 1 year ago

Hello, by the way, I am thinking about the possibility of using PockerAnchor as an encoder for protein pockets in pdb format, e.g. 12asA_site_1.pdb, so that each pocket can be mapped to a corresponding representation which can subsequently be used in tasks such as pocket matching. Could you please provide me with some guidance on how to achieve it?

Besides, it seems that code for processing ligands has not been provided, so I can't use PocketAnchor to conduct ligand binding affinity task on other datasets.

I would really appreciate it if I could get your help!