tensors.pt - Githubissues

imerelli commented 11 months ago

Hi, I have problems with the definition of the tensors.pt file and the relative map.

#checkpoint = torch.load(MODEL_CHECKPOINT_PATH)
checkpoint = torch.load('tensors.pt', map_location=torch.device('cpu'))

I see that in the documentation I should do something like this

(PreMut)$ python
Python 3.10.8 (main, Nov 24 2022, 14:13:03) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> x = torch.tensor([0])
>>> torch.save(x, 'tensors.pt')

But it is not working

(PreMut)$ python src/prediction.py /DATA/34/CXCR4/CXCR4_AF.pdb D_171_A A /DATA/34/CXCR4/PreMut/
/opt/tools/deg/miniforge3/envs/PreMut/lib/python3.10/site-packages/pytorch_lightning/utilities/parsing.py:262: UserWarning: Attribute 'MODEL' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['MODEL'])`.
  rank_zero_warn(
Traceback (most recent call last):
  File "/DATA/34/CXCR4/PreMut/src/prediction.py", line 154, in <module>
    prediction.predict()
  File "/DATA/34/CXCR4/PreMut/src/prediction.py", line 119, in predict
    for key, value in checkpoint['state_dict'].items():
IndexError: too many indices for tensor of dimension 1

Can you help me?

sajidmahmud commented 11 months ago

Hello, Is this problem persisting for this particular pdb file or every other?

sajidmahmud commented 11 months ago

Also, Are you trying to load a different tensor file to use for the prediction? Because I am a bit confused about where you are getting the tensor.pt file from. Apologies for not understanding your question clearly.

imerelli commented 11 months ago

The problem persists for all the pdb files I tried. The fact is that I have no idea of what I have to put in this line checkpoint = torch.load(MODEL_CHECKPOINT_PATH) and, if I have to create the file tensors.pt, how I can do that. I'm using a server with an A100 GPU.

sajidmahmud commented 11 months ago

Hello, You do not need to change that line, MODEL_CHECKPOINT_PATH is a variable containing the path to the saved model weights, in this instance the Saved_Model directory. If you check the Saved_Model directory, it contains a file called model.ckpt, which is the saved model weights. There is no need to create any tensor.pt file.

Please let me know if you have further questions.

Thanks and regards.

imerelli commented 11 months ago

Hello, using the original script with this line checkpoint = torch.load(MODEL_CHECKPOINT_PATH) gives this error:

$ python src/prediction.py examples/8b0s.pdb C_144_A A predictions                                                                                                 
/opt/tools/deg/miniforge3/envs/PreMut/lib/python3.10/site-packages/pytorch_lightning/utilities/parsing.py:262: UserWarning: Attribute 'MODEL' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['MODEL'])`.                                                                                             
  rank_zero_warn(
Traceback (most recent call last):
  File "/opt/tools/deg/PreMut/src/prediction.py", line 154, in <module>
    prediction.predict()
  File "/opt/tools/deg/PreMut/src/prediction.py", line 114, in predict
    checkpoint = torch.load(MODEL_CHECKPOINT_PATH)
  File "/opt/tools/deg/miniforge3/envs/PreMut/lib/python3.10/site-packages/torch/serialization.py", line 712, in load                                                                               
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/opt/tools/deg/miniforge3/envs/PreMut/lib/python3.10/site-packages/torch/serialization.py", line 1049, in _load                                                                             
    result = unpickler.load()
  File "/opt/tools/deg/miniforge3/envs/PreMut/lib/python3.10/pickle.py", line 1213, in load
    dispatch[key[0]](self)
  File "/opt/tools/deg/miniforge3/envs/PreMut/lib/python3.10/pickle.py", line 1254, in load_binpersid                                                                                               
    self.append(self.persistent_load(pid))
  File "/opt/tools/deg/miniforge3/envs/PreMut/lib/python3.10/site-packages/torch/serialization.py", line 1019, in persistent_load                                                                   
    load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "/opt/tools/deg/miniforge3/envs/PreMut/lib/python3.10/site-packages/torch/serialization.py", line 1001, in load_tensor                                                                       
    wrap_storage=restore_location(storage, location),
  File "/opt/tools/deg/miniforge3/envs/PreMut/lib/python3.10/site-packages/torch/serialization.py", line 175, in default_restore_location                                                           
    result = fn(storage, location)
  File "/opt/tools/deg/miniforge3/envs/PreMut/lib/python3.10/site-packages/torch/serialization.py", line 152, in _cuda_deserialize                                                                  
    device = validate_cuda_device(location)
  File "/opt/tools/deg/miniforge3/envs/PreMut/lib/python3.10/site-packages/torch/serialization.py", line 143, in validate_cuda_device                                                               
    raise RuntimeError('Attempting to deserialize object on CUDA device '
RuntimeError: Attempting to deserialize object on CUDA device 1 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device.

This is the reason why I tried to modify it. I have this error with all the pdb I tried, also with the example in the documentation. The gpu is there:

$ nvidia-smi 
Sat Dec  9 09:38:15 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01    Driver Version: 470.82.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   49C    P0    68W / 300W |      0MiB / 80994MiB |      3%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

sajidmahmud commented 11 months ago

Hello, I have done a new commit which should address the CUDA device issue. Could you do a git pull or a new git clone and try again? Please let me know if it works.