Open BJWiley233 opened 2 years ago
This error is displayed if there are less than 10 positive molecules in the validation and/or test set. I think it may be due to using such small samples for screening a library of >883M molecules. Can you check these values:
(y_valid_first.r_i_docking_score < cf).values.sum()
and
(y_test_first.r_i_docking_score < cf).values.sum()
Most likely, you have ~38 hits in the validation set but less than 10 in the test set due, assuming you are labelling top 1% molecules as positives in the validation set.
Yea I am debugging now. What does "positive" molecules mean exactly? All my labels have negative energy values from AutoDockGPU. One thing I noticed in the get_data
function is this line reads as if there is a header in the morgan files:
morgan = pd.read_csv(morgan_path, usecols=[0], header=0, names=['ZINC_ID'])
However when running setup scripts the morgan files have no header so might want to check this.
$ head -n2 morgan/*.csv
==> morgan/test_morgan_1024_updated.csv <==
548482427,33,36,39,80,98,128,138,162,214,218,233,249,268,293,294,310,330,356,357,366,367,378,385,406,428,444,456,460,511,521,531,538,561,567,573,650,656,658,659,667,675,695,698,726,730,751,757,758,760,792,807,812,823,849,857,875,893,926,935,944,950,985,1004,1019
1822322919,4,11,64,74,75,90,92,128,148,175,188,193,197,205,209,231,238,242,251,255,268,272,285,288,290,301,356,361,378,389,428,441,455,456,480,498,505,539,618,623,639,647,650,656,673,689,726,738,792,807,836,849,856,875,881,890,893,897,926,935,950,974,980,1019
==> morgan/train_morgan_1024_updated.csv <==
55646157,4,14,33,46,59,93,216,356,361,367,369,456,480,487,497,543,553,554,561,565,650,659,673,675,690,698,726,752,781,807,816,849,881,893,926,935,942,985
1566226336,1,33,36,46,59,65,80,114,120,128,150,197,216,231,250,283,356,393,429,561,575,609,623,641,650,659,679,689,693,723,726,737,807,808,816,849,867,884,893,904,926,1009,1019
==> morgan/valid_morgan_1024_updated.csv <==
1753308234,1,15,33,41,58,70,80,97,117,128,151,228,241,247,283,288,294,301,338,351,413,472,526,537,540,618,636,650,674,701,739,794,800,807,886,887,888,893,895,926,935,985,994,996,1009,1019
1025119100,4,56,64,70,80,112,128,162,179,193,213,242,255,301,319,340,343,356,360,378,428,436,440,448,456,480,496,497,504,528,580,650,656,658,674,675,726,790,807,842,849,862,890,893,918,926,935,974,976,1004,1009,1019
Ahh I bet it's because I was using AutoDock Vina before which was give extremely low energy values which was inaccurate and that there is a really low cf
value from the last time I set up the model scripts.
Yup that was it! Now just running into datatype issues. I think I can figure them out.
x data from: /storage1/fs1/bolton/Active/projects/BWILEYtest/test/DeepDockABL1/iteration_1/morgan/test_morgan_1024_updated.csv
Done...
Index(['r_i_docking_score'], dtype='object')
r_i_docking_score
Traceback (most recent call last):
File "/storage1/fs1/bolton/Active/projects/BWILEYtest/Deep-Docking-NonAutomated/phase_2-3/progressive_docking.py", line 346, in <module>
X_test, y_test = get_morgan_and_scores(f, y_test)
File "/storage1/fs1/bolton/Active/projects/BWILEYtest/Deep-Docking-NonAutomated/phase_2-3/progressive_docking.py", line 156, in get_morgan_and_scores
train_data = pd.merge(ID_labels, train_pd, how='inner',on=['ZINC_ID'])
File "/usr/local/lib/python3.8/dist-packages/pandas/core/reshape/merge.py", line 106, in merge
op = _MergeOperation(
File "/usr/local/lib/python3.8/dist-packages/pandas/core/reshape/merge.py", line 703, in __init__
self._maybe_coerce_merge_keys()
File "/usr/local/lib/python3.8/dist-packages/pandas/core/reshape/merge.py", line 1256, in _maybe_coerce_merge_keys
raise ValueError(msg)
ValueError: You are trying to merge on int64 and object columns. If you wish to proceed you should use pd.concat
fixed but ugh...
Traceback (most recent call last):
File "/storage1/fs1/bolton/Active/projects/BWILEYtest/Deep-Docking-NonAutomated/phase_2-3/progressive_docking.py", line 468, in <module>
progressive_docking.fit(Oversampled_X_train,
File "/storage1/fs1/bolton/Active/projects/BWILEYtest/Deep-Docking-NonAutomated/phase_2-3/ML/DDModel.py", line 138, in fit
self.history = self.model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, shuffle=shuffle,
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/keras/engine/training.py", line 708, in fit
return func.fit(
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 658, in fit
return fit_loop(
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 394, in model_iteration
batch_outs = f(ins_batch)
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/keras/backend.py", line 3475, in __call__
fetched = self._callable_fn(*array_vals,
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1470, in __call__
ret = tf_session.TF_SessionRunCallable(self._session._session,
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: cuDNN launch failure : input shape ([256,1,1,100])
[[{{node batch_normalization/cond/FusedBatchNormV3}}]]
[[metrics/acc/Identity/_195]]
(1) Internal: cuDNN launch failure : input shape ([256,1,1,100])
[[{{node batch_normalization/cond/FusedBatchNormV3}}]]
0 successful operations.
0 derived errors ignored.
Ok thanks for noticing the header issue, I corrected it in the repo. Regarding the last issue, what version of tensorflow are you using, and can you try to see if the problem is reproducible with a batch size of smaller size (-bs flag, change from 256 to 64)?
Using this image nvcr.io/nvidia/tensorflow:22.08-tf1-py3
from Nvidia so it's probably most up to date version.
Forgot to post this error:
E tensorflow/stream_executor/cuda/cuda_dnn.cc:377] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Looking here was maybe just a memory issue. Raised it from 32GB to 128GB and it worked. Going to see if 64GB will work also.
This seems a lot of memory that is used for a batch size of 256. Have you checked how much memory the process uses on the GPU?
I can go see now. Hard to do since it's on a cluster and LSF reporting is not always the best. Can I do that from inside a blade I am on maybe? It's working now in just 16GB given to the blade. I think the blades are just a little wonky right now.
Just saw tf verision is 1.15.5. Don't think that is the newest. Going to try to update it. I think this is the newest 2.0 image nvcr.io/nvidia/tensorflow:22.08-tf2-py3
Quick question. I get the "There are not enough hits" after running simple_job_models.py
if -n_it
is one less than -titr
, i.e. n_it=5 and titr=6. Is this supposed to happen? If I increase -titr
to 7 then simple_job_1.sh
gives more hits because the cf
variable increases from -13.280741910024236 to -13.17074191002422 which makes 12 hits.
simple_job_models.py -n_it 5 -titr 6 -mdd mdd_directory -time 00-04:00 -file_path fo -nhp 4
gives
simple_job_1.sh ...
...
Training labels shape: (57288, 1)
# output debugging
y_valid_first.r_i_docking_score.min()= -14.3
cf= -13.280741910024236
# sorted
r_i_docking_score
ZINC_ID
1834732158 -14.30
904471353 -13.65
1815850557 -13.64
1831236886 -13.57
1807156813 -13.34
448139959 -13.33
793490098 -13.31
105142162 -13.28
302933119 -13.27
1183914022 -13.25
570756767 -13.23
488239025 -13.22
1785720908 -13.17
1591004849 -13.14
1825766879 -13.08
(y_valid_first.r_i_docking_score < cf).values.sum()= 7
There are not enough hits... exiting.
complete
Hi Developers, I am running the Deep-Docking-NonAutomated protocol. At the phase 3, I have go the following error message. I could not able to understand what was wrong. All necessary file are in place, I have got model_7 as a best model. Could you please let me know your thoughts on it. Thanks in advance and best regards, -Prajwal
############# (base) iteration_1 $python -u ../../phase_2-3/Prediction_morgan_1024.py -fn smiles_all_01.txt -protein PROJECT-TEST -it 1 -mdd ../Ligands/ZINC20_fp_chunk_1 -file_path ../../PROJECT-TEST Using TensorFlow backend. sampling: Number of models to predict: 0 sampling: Starting Predictions... sampling: We are predicting from the file smiles_all_01.txt located in ../Ligands/ZINC20_fp_chunk_1 sampling: We are currently running line ZINC000978196592_1,33,38,74,80,85,90,111,126,139,195,197,208,218,219,242,269,301,334,356,378,400,422,428,456,462,489,511,520,523,609,620,650,656,680,685,739,751,784,802,807,834,848,849,859,881,893,899,926,935,943,950,1019
sampling: (1) Predicting... Time elapsed: 14.007264375686646 seconds.
Traceback (most recent call last):
File "../../phase_2-3/Prediction_morgan_1024.py", line 123, in
I am getting this error just as mentioned in the other ticket however everything is being read correctly and the file names are fine. This is my result: