samsledje / ConPLex

Adapting protein language models and contrastive learning for highly-accurate drug-target interaction prediction.
http://conplex.csail.mit.edu
MIT License
119 stars 32 forks source link

Config issue for data_cache_dir in default_config.yaml #22

Closed tobigithub closed 1 year ago

tobigithub commented 1 year ago

🐛 Bug Report

Issue: FileNotFoundError: [Errno 2] No such file or directory: '/data/cb/samsl/Adapting_PLM_DTI/dataset/DAVIS/train.csv' Setup: Linux in completely new VM

Traceback (most recent call last):
  File "/usr/local/bin/conplex-dti", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/conplex_dti/__main__.py", line 41, in main
    args.main_func(args)
  File "/usr/local/lib/python3.10/dist-packages/conplex_dti/cli/train.py", line 322, in main
    datamodule.prepare_data()
  File "/usr/local/lib/python3.10/dist-packages/conplex_dti/dataset/datamodules.py", line 228, in prepare_data
    df_train = pd.read_csv(self._data_dir / self._train_path, **self._csv_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py", line 211, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py", line 950, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py", line 605, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py", line 1442, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py", line 1735, in _make_engine
    self.handles = get_handle(
  File "/usr/local/lib/python3.10/dist-packages/pandas/io/common.py", line 856, in get_handle
    handle = open(
FileNotFoundError: [Errno 2] No such file or directory: '/data/cb/samsl/Adapting_PLM_DTI/dataset/DAVIS/train.csv'

Solution change old data_cache_dir in default_config.yaml

# Logging and Paths
wandb_proj: NoSigmoidTest # Weights and Biases project to log results to.
wandb_save: True # Whether or not to log to Weights and Biases.
log_file: ./logs/scratch_testing.log # Location of log file
model_save_dir: ./best_models # Location to save best models
data_cache_dir: /data/cb/samsl/Adapting_PLM_DTI/dataset # Location of downloaded data (use `conplex_dti download`)

to working solution:

# Logging and Paths
wandb_proj: NoSigmoidTest # Weights and Biases project to log results to.
wandb_save: True # Whether or not to log to Weights and Biases.
log_file: ./logs/scratch_testing.log # Location of log file
model_save_dir: ./best_models # Location to save best models
data_cache_dir: ./datasets # Location of downloaded data (use `conplex_dti download`)

Proof of working solution:

2023-06-29 03:58:26,959 [INFO] Using CUDA device cuda:0
2023-06-29 03:58:26,959 [INFO] Using CUDA device cuda:0
2023-06-29 03:58:26,960 [DEBUG] Setting random state 0
2023-06-29 03:58:26,960 [DEBUG] Setting random state 0
2023-06-29 03:58:26,960 [INFO] Preparing DataModule
2023-06-29 03:58:26,960 [INFO] Preparing DataModule
2023-06-29 03:58:28.135319: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Some weights of the model checkpoint at Rostlab/prot_bert were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
2023-06-29 03:58:37,909 [INFO] Writing Morgan features to /content/datasets/DAVIS/Morgan_features.h5
2023-06-29 03:58:37,909 [INFO] Writing Morgan features to /content/datasets/DAVIS/Morgan_features.h5
Morgan: 100% 68/68 [00:00<00:00, 595.17it/s]
2023-06-29 03:58:38,036 [INFO] Writing ProtBert features to /content/datasets/DAVIS/ProtBert_features.h5
2023-06-29 03:58:38,036 [INFO] Writing ProtBert features to /content/datasets/DAVIS/ProtBert_features.h5
ProtBert: 100% 379/379 [02:02<00:00,  3.11it/s]
2023-06-29 04:00:41,540 [INFO] Preloading Morgan features from /content/datasets/DAVIS/Morgan_features.h5
2023-06-29 04:00:41,540 [INFO] Preloading Morgan features from /content/datasets/DAVIS/Morgan_features.h5
Morgan: 100% 68/68 [00:00<00:00, 3453.31it/s]
2023-06-29 04:00:41,563 [INFO] Preloading ProtBert features from /content/datasets/DAVIS/ProtBert_features.h5
2023-06-29 04:00:41,563 [INFO] Preloading ProtBert features from /content/datasets/DAVIS/ProtBert_features.h5
ProtBert: 100% 379/379 [00:00<00:00, 3523.09it/s]
2023-06-29 04:00:42,489 [INFO] Getting DataLoaders
2023-06-29 04:00:42,489 [INFO] Getting DataLoaders
2023-06-29 04:00:42,490 [INFO] Loading contrastive data (DUDE)
2023-06-29 04:00:42,490 [INFO] Loading contrastive data (DUDE)
Some weights of the model checkpoint at Rostlab/prot_bert were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
2023-06-29 04:01:16,814 [INFO] Preloading Morgan features from /content/datasets/DUDe/Morgan_features.h5
2023-06-29 04:01:16,814 [INFO] Preloading Morgan features from /content/datasets/DUDe/Morgan_features.h5
2023-06-29 04:01:16,814 [INFO] Writing Morgan features to /content/datasets/DUDe/Morgan_features.h5
2023-06-29 04:01:16,814 [INFO] Writing Morgan features to /content/datasets/DUDe/Morgan_features.h5
samsledje commented 1 year ago

Hi Tobias -- the default config is meant to be a template that the user should update to the settings on their system. Still, you make a great point that it would be helpful to have the default using relative paths. If you've the opportunity, could you submit a pull request with these changes to the dev branch? Thanks!

tobigithub commented 1 year ago

Tanks, I forgot to mention that I just wanted to run the demo, basically a clean install from:

pip install conplex-dti

So changing the the dataset dir made that possible. Nothing complex.

data_cache_dir: ./datasets 
samsledje commented 1 year ago

Fixed in 8a6b2264aad0495e055f73f561d3c2012bbbb1e5