jamesgleave / Deep-Docking-NonAutomated

MIT License
45 stars 17 forks source link

There are not enough hits #21

Open BJWiley233 opened 2 years ago

BJWiley233 commented 2 years ago

I am getting this error just as mentioned in the other ticket however everything is being read correctly and the file names are fine. This is my result:

2022-11-24 20:13:43.339231: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Parsing args...
['Deep-Docking-NonAutomated/phase_2-3/progressive_docking.py', '-os', '10', '-bs', '256', '-num_units', '100', '-dropout', '0.2', '-learn_rate', '0.0001', '-bin_array', '2', '-wt', '2', '-cf', '-982.1397872332886', '-rec', '0.9', '-n_it', '1', '-t_mol', '883.891533', '--data_path', 'test/DeepDockABL1', '--save_path', 'test/DeepDockABL1', '-n_mol', '100']
100 0.2 0.0001 2 2.0 -982.1397872332886 256 10 test/DeepDockABL1
Training size not specified, using entire dataset...
Finished parsing args...

Getting data from iteration 1
Data acquired...
Train shape: (7688, 1) Valid shape: (3800, 1) Test shape: (3800, 1)
Data Augmentation iteration 1 data shape: (7688, 1)
Training labels shape:  (7688, 1)
There are not enough hits... exiting.
fgentile89 commented 2 years ago

This error is displayed if there are less than 10 positive molecules in the validation and/or test set. I think it may be due to using such small samples for screening a library of >883M molecules. Can you check these values:

(y_valid_first.r_i_docking_score < cf).values.sum()

and

(y_test_first.r_i_docking_score < cf).values.sum()

Most likely, you have ~38 hits in the validation set but less than 10 in the test set due, assuming you are labelling top 1% molecules as positives in the validation set.

BJWiley233 commented 2 years ago

Yea I am debugging now. What does "positive" molecules mean exactly? All my labels have negative energy values from AutoDockGPU. One thing I noticed in the get_data function is this line reads as if there is a header in the morgan files:

morgan = pd.read_csv(morgan_path, usecols=[0], header=0, names=['ZINC_ID'])

However when running setup scripts the morgan files have no header so might want to check this.

$ head -n2 morgan/*.csv
==> morgan/test_morgan_1024_updated.csv <==
548482427,33,36,39,80,98,128,138,162,214,218,233,249,268,293,294,310,330,356,357,366,367,378,385,406,428,444,456,460,511,521,531,538,561,567,573,650,656,658,659,667,675,695,698,726,730,751,757,758,760,792,807,812,823,849,857,875,893,926,935,944,950,985,1004,1019
1822322919,4,11,64,74,75,90,92,128,148,175,188,193,197,205,209,231,238,242,251,255,268,272,285,288,290,301,356,361,378,389,428,441,455,456,480,498,505,539,618,623,639,647,650,656,673,689,726,738,792,807,836,849,856,875,881,890,893,897,926,935,950,974,980,1019

==> morgan/train_morgan_1024_updated.csv <==
55646157,4,14,33,46,59,93,216,356,361,367,369,456,480,487,497,543,553,554,561,565,650,659,673,675,690,698,726,752,781,807,816,849,881,893,926,935,942,985
1566226336,1,33,36,46,59,65,80,114,120,128,150,197,216,231,250,283,356,393,429,561,575,609,623,641,650,659,679,689,693,723,726,737,807,808,816,849,867,884,893,904,926,1009,1019

==> morgan/valid_morgan_1024_updated.csv <==
1753308234,1,15,33,41,58,70,80,97,117,128,151,228,241,247,283,288,294,301,338,351,413,472,526,537,540,618,636,650,674,701,739,794,800,807,886,887,888,893,895,926,935,985,994,996,1009,1019
1025119100,4,56,64,70,80,112,128,162,179,193,213,242,255,301,319,340,343,356,360,378,428,436,440,448,456,480,496,497,504,528,580,650,656,658,674,675,726,790,807,842,849,862,890,893,918,926,935,974,976,1004,1009,1019
BJWiley233 commented 2 years ago

Ahh I bet it's because I was using AutoDock Vina before which was give extremely low energy values which was inaccurate and that there is a really low cf value from the last time I set up the model scripts.

BJWiley233 commented 2 years ago

Yup that was it! Now just running into datatype issues. I think I can figure them out.

x data from: /storage1/fs1/bolton/Active/projects/BWILEYtest/test/DeepDockABL1/iteration_1/morgan/test_morgan_1024_updated.csv
Done...
Index(['r_i_docking_score'], dtype='object')
r_i_docking_score
Traceback (most recent call last):
  File "/storage1/fs1/bolton/Active/projects/BWILEYtest/Deep-Docking-NonAutomated/phase_2-3/progressive_docking.py", line 346, in <module>
    X_test, y_test = get_morgan_and_scores(f, y_test)
  File "/storage1/fs1/bolton/Active/projects/BWILEYtest/Deep-Docking-NonAutomated/phase_2-3/progressive_docking.py", line 156, in get_morgan_and_scores
    train_data = pd.merge(ID_labels, train_pd, how='inner',on=['ZINC_ID'])
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/reshape/merge.py", line 106, in merge
    op = _MergeOperation(
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/reshape/merge.py", line 703, in __init__
    self._maybe_coerce_merge_keys()
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/reshape/merge.py", line 1256, in _maybe_coerce_merge_keys
    raise ValueError(msg)
ValueError: You are trying to merge on int64 and object columns. If you wish to proceed you should use pd.concat
BJWiley233 commented 2 years ago

fixed but ugh...

Traceback (most recent call last):
  File "/storage1/fs1/bolton/Active/projects/BWILEYtest/Deep-Docking-NonAutomated/phase_2-3/progressive_docking.py", line 468, in <module>
    progressive_docking.fit(Oversampled_X_train,
  File "/storage1/fs1/bolton/Active/projects/BWILEYtest/Deep-Docking-NonAutomated/phase_2-3/ML/DDModel.py", line 138, in fit
    self.history = self.model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, shuffle=shuffle,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/keras/engine/training.py", line 708, in fit
    return func.fit(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 658, in fit
    return fit_loop(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 394, in model_iteration
    batch_outs = f(ins_batch)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/keras/backend.py", line 3475, in __call__
    fetched = self._callable_fn(*array_vals,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1470, in __call__
    ret = tf_session.TF_SessionRunCallable(self._session._session,
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: cuDNN launch failure : input shape ([256,1,1,100])
     [[{{node batch_normalization/cond/FusedBatchNormV3}}]]
     [[metrics/acc/Identity/_195]]
  (1) Internal: cuDNN launch failure : input shape ([256,1,1,100])
     [[{{node batch_normalization/cond/FusedBatchNormV3}}]]
0 successful operations.
0 derived errors ignored.
fgentile89 commented 2 years ago

Ok thanks for noticing the header issue, I corrected it in the repo. Regarding the last issue, what version of tensorflow are you using, and can you try to see if the problem is reproducible with a batch size of smaller size (-bs flag, change from 256 to 64)?

BJWiley233 commented 2 years ago

Using this image nvcr.io/nvidia/tensorflow:22.08-tf1-py3 from Nvidia so it's probably most up to date version.

Forgot to post this error:

E tensorflow/stream_executor/cuda/cuda_dnn.cc:377] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Looking here was maybe just a memory issue. Raised it from 32GB to 128GB and it worked. Going to see if 64GB will work also.

fgentile89 commented 2 years ago

This seems a lot of memory that is used for a batch size of 256. Have you checked how much memory the process uses on the GPU?

BJWiley233 commented 2 years ago

I can go see now. Hard to do since it's on a cluster and LSF reporting is not always the best. Can I do that from inside a blade I am on maybe? It's working now in just 16GB given to the blade. I think the blades are just a little wonky right now.

Just saw tf verision is 1.15.5. Don't think that is the newest. Going to try to update it. I think this is the newest 2.0 image nvcr.io/nvidia/tensorflow:22.08-tf2-py3

BJWiley233 commented 1 year ago

Quick question. I get the "There are not enough hits" after running simple_job_models.py if -n_it is one less than -titr, i.e. n_it=5 and titr=6. Is this supposed to happen? If I increase -titr to 7 then simple_job_1.sh gives more hits because the cf variable increases from -13.280741910024236 to -13.17074191002422 which makes 12 hits.

simple_job_models.py -n_it 5 -titr 6 -mdd mdd_directory -time 00-04:00 -file_path fo -nhp 4 

gives

simple_job_1.sh ...
...
Training labels shape:  (57288, 1)

# output debugging
y_valid_first.r_i_docking_score.min()= -14.3
cf= -13.280741910024236
# sorted
            r_i_docking_score
ZINC_ID
1834732158             -14.30
904471353              -13.65
1815850557             -13.64
1831236886             -13.57
1807156813             -13.34
448139959              -13.33
793490098              -13.31
105142162              -13.28
302933119              -13.27
1183914022             -13.25
570756767              -13.23
488239025              -13.22
1785720908             -13.17
1591004849             -13.14
1825766879             -13.08
(y_valid_first.r_i_docking_score < cf).values.sum()= 7

There are not enough hits... exiting.
complete
prajwal07 commented 1 year ago

Hi Developers, I am running the Deep-Docking-NonAutomated protocol. At the phase 3, I have go the following error message. I could not able to understand what was wrong. All necessary file are in place, I have got model_7 as a best model. Could you please let me know your thoughts on it. Thanks in advance and best regards, -Prajwal

############# (base) iteration_1 $python -u ../../phase_2-3/Prediction_morgan_1024.py -fn smiles_all_01.txt -protein PROJECT-TEST -it 1 -mdd ../Ligands/ZINC20_fp_chunk_1 -file_path ../../PROJECT-TEST Using TensorFlow backend. sampling: Number of models to predict: 0 sampling: Starting Predictions... sampling: We are predicting from the file smiles_all_01.txt located in ../Ligands/ZINC20_fp_chunk_1 sampling: We are currently running line ZINC000978196592_1,33,38,74,80,85,90,111,126,139,195,197,208,218,219,242,269,301,334,356,378,400,422,428,456,462,489,511,520,523,609,620,650,656,680,685,739,751,784,802,807,834,848,849,859,881,893,899,926,935,943,950,1019

 sampling: (1) Predicting... Time elapsed: 14.007264375686646 seconds.

Traceback (most recent call last): File "../../phase_2-3/Prediction_morgan_1024.py", line 123, in returned = prediction_morgan(fn, models, tr) File "../../phase_2-3/Prediction_morgan_1024.py", line 67, in prediction_morgan for j in range(len(pred[0])): IndexError: list index out of range #############