Can't get all the predictions

Running-z commented 6 years ago

I trained a model, I want to try the predictions, I use your latest code, modify the drive_nci60.sh file, and then predict that I have a total of 11557 data, but the final prediction only gets 9245 data, so you mentioned before , the forecast does not divide the data, it seems that the expected results have not been achieved. This is the total number of predicted results: default

This is the modified drive_nci60.sh file:

CUDA_VISIBLE_DEVICES=1
spec='python driver.py --dataset davis --prot_desc_path davis_data/prot_desc.csv \
--model graphconvreg \
--model_dir ./model_dir2/model_dir_davis_w \
--predict_only --csv_out ./NCI60_data/preds_all_tc_graphconv.csv '
eval $spec

simonfqy commented 6 years ago

Hi, this is because the --dataset davis parameter will invoke the load_davis() function, and in the load_davis() function, the frac_train is set at 0.8 (the default value), resulting in only 80% of the data being output in the prediction file. Please refer to this line in load_nci60() function to learn how to set the split ratios: https://github.com/simonfqy/PADME/blob/5e97ba97f1389ea975b196a31b3464ca2cd00512/dcCustom/molnet/load_function/nci60_dataset.py#L93 My implementation of load_nci60() function is very specific. It relies on ToxCast, which is a very complicated dataset. The problem with you directly using load_davis() is that, the transformer used inside is not the transformer calculated on the original dataset (davis dataset in this case), but your own dataset for prediction. So as I said before, I suggest you create a "template dataset" for the program to know the drug-target pairs to be predicted; its format should be identical to the restructured.csv files in the dataset folders like davis_data/, the only difference being the drug-target pairs' interaction should all be 0 (or any other random number, but 0 is more convenient). Making the interaction all-zero will help you see the problems, should any of them arise. I decide to include a better function called the load_customized() to give you a better piece of code for predicting the DTI using trained model, but generating the "template dataset" is a task of your own. Wait a while for me to update it.

Running-z commented 6 years ago

@simonfqy Ok, I will continue to try the method you gave, thank you for your patience.

Running-z commented 6 years ago

@simonfqy I ran my drive4_d_warm.sh file to train my data, then I did different training by modifying different batch_size and learning_rate to get different models and predict my data separately, but my unexpected error:

Traceback (most recent call last):
  File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_call
    return fn(*args)
  File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1329, in _run_fn
    status, run_metadata)
  File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [128] rhs shape= [64]
     [[Node: save/Assign_2 = Assign[T=DT_FLOAT, _class=["loc:@BatchNormalization_18/BatchNormalization_18_beta"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](BatchNormalization_18/BatchNormalization_18_beta, save/RestoreV2_2)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "driver.py", line 699, in <module>
    tf.app.run(main=run_analysis, argv=[sys.argv[0]] + unparsed)
  File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 124, in run
    _sys.exit(main(argv))
  File "driver.py", line 278, in run_analysis
    prediction_file=csv_out)
  File "/project/git2/PADME/dcCustom/molnet/run_benchmark_models.py", line 194, in model_regression
    model.predict(train_dataset, transformers=transformers, csv_out=prediction_file, tasks=tasks)
  File "/project/git2/PADME/dcCustom/models/tensorgraph/tensor_graph.py", line 642, in predict
    self.restore()
  File "/project/git2/PADME/dcCustom/models/tensorgraph/tensor_graph.py", line 1066, in restore
    saver.restore(self.session, checkpoint)
  File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1686, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1128, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1344, in _do_run
    options, run_metadata)
  File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1363, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [128] rhs shape= [64]
     [[Node: save/Assign_2 = Assign[T=DT_FLOAT, _class=["loc:@BatchNormalization_18/BatchNormalization_18_beta"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](BatchNormalization_18/BatchNormalization_18_beta, save/RestoreV2_2)]]

Caused by op 'save/Assign_2', defined at:
  File "driver.py", line 699, in <module>
    tf.app.run(main=run_analysis, argv=[sys.argv[0]] + unparsed)
  File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 124, in run
    _sys.exit(main(argv))
  File "driver.py", line 278, in run_analysis
    prediction_file=csv_out)
  File "/project/git2/PADME/dcCustom/molnet/run_benchmark_models.py", line 194, in model_regression
    model.predict(train_dataset, transformers=transformers, csv_out=prediction_file, tasks=tasks)
  File "/project/git2/PADME/dcCustom/models/tensorgraph/tensor_graph.py", line 642, in predict
    self.restore()
  File "/project/git2/PADME/dcCustom/models/tensorgraph/tensor_graph.py", line 1065, in restore
    saver = tf.train.Saver(var_list=var_list)
  File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1239, in __init__
    self.build()
  File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1248, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1284, in _build
    build_save=build_save, build_restore=build_restore)
  File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 765, in _build_internal
    restore_sequentially, reshape)
  File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 440, in _AddRestoreOps
    assign_ops.append(saveable.restore(tensors, shapes))
  File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 160, in restore
    self.op.get_shape().is_fully_defined())
  File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 276, in assign
    validate_shape=validate_shape)
  File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 59, in assign
    use_locking=use_locking, name=name)
  File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
    op_def=op_def)
  File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1625, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [128] rhs shape= [64]
     [[Node: save/Assign_2 = Assign[T=DT_FLOAT, _class=["loc:@BatchNormalization_18/BatchNormalization_18_beta"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](BatchNormalization_18/BatchNormalization_18_beta, save/RestoreV2_2)]]

I used the same forecast data, which made me very upset. I didn't solve this problem today.

simonfqy commented 6 years ago

@Running-z The parameters --predict_only and --restore_model will load the model already trained. Now that you want to train from scratch, you should remove the --predict_only parameter. The problem with your existing implementation is that you are using the model that is already trained, but you're now trying to use a set of different hyperparameters, which would result in a different model. So the error is expected. I will write my load_customized() function a bit later, but hopefully soon enough.

simonfqy commented 6 years ago

I suggest you read the documentation in driver.py about the parameters like --predict_only more carefully, hopefully, it will prevent you from making future mistakes.

Running-z commented 6 years ago

@simonfqy Ok, I used the source code before you didn't have the --restore_model parameter. Today I follow the method you said. I still got the same error. My goal is to predict me using the trained model. The data is not training. I modified the nb_epoch in the preset_hyper_parameters.pyfile separately, the learning_rate training, and then saved the three trained models. I want to use these three models for prediction separately, so I don't need - -predict_only? According to your --restore_modelparameter, if --predict_only is True, --restore_model is also True, but I still get the same error.

Running-z commented 6 years ago

Another question is, since your model method is from deepchem, I also saw that you saved the model using the save method in deepchem, but there is no model.pickle file in the training result, you did not use the save method. So I think, the model that is trained so that the new data will be loaded with weights? Will the result be a random result, because I trained a model yesterday, predicting new data, predicting values and The error between real values is very large, just like random

simonfqy commented 6 years ago

@Running-z If you're using the same set of parameters (except nb_epoch which does not matter) for training and predicting, you should be able to get the correct result. Are you sure that you used the changed hyperparameters for training? Because if you do, I don't understand why it is the case.

Regarding your second comment, though I don't store model.pickle (I removed the line because the pickle file is not useful for me), the model trained that is stored in --model_dir contains everything about the model, including number of neurons in each layer, all the links between neurons and the weight of those edges. And I plotted the "new" data points using the trained model, and they make sense. (I will include the pictures in my paper) You said that the error with real values is very large, so I think you are doing something wrong. One thing could be that, in your "template dataset", the interaction values are non-zero, and the normalization transformer calculates the mean and standard deviations of those values, when they should actually be the same transformer (and using the mean and standard deviations of the training dataset) when you were training the data.

You can provide me with more information if you want. I will try to implement and upload the load_customized() function within 24 hours.

Running-z commented 6 years ago

@simonfqy Yes, I first modify the batch_size, nb_epoch, and learning_rate values of the graphconverg model in preset_hyper_parameters.py to perform the first model training. When the training starts, I change the batch_size, nb_epoch, learning_rate values of the graphconverg model in preset_hyper_parameters.py again, and then again. Training, when both models are trained, I changed the model_dir path to predict, but I got the above error

Running-z commented 6 years ago

My interaction value is non-zero, and my training data looks like this: default

My forecast data looks like this: default

The true_pX in the image below represents the true value, and pre_pX represents the predicted value. It can be seen that the error is really unacceptable:

default

My training data and forecast data are all loaded using davis data, so the processing should be exactly the same. I don't understand what you mean by normalization transformers the mean and standard deviations, or what else I did wrong. ?

simonfqy commented 6 years ago

@Running-z I have updated the load_nci60() function as a temporary quick and dirty solution. You can refer to drive_nci60.sh and drive4_nci60.sh to see how to use it. You should also read the documentation in https://github.com/simonfqy/PADME/blob/d2d307fe17e1229add45f0c82bd50ed12bbfae35/dcCustom/molnet/load_function/nci60_dataset.py#L25 Because if you don't read the documentation and just start head on, you will hit bugs. Especially you should change your the name of your template prediction file as prescribed in the documentation. You need to really have a look at the code to understand where transformer comes into play. To put it simple, NormalizationTransformer calculates the mean and standard deviation of the raw data, and then transforms the raw data into normalized z-scores, and the system (load_****() functions) then stores them into the DiskDataset object. So in training and prediction, you are trained on z-scores, and the DNN outputs z-scores, then the NormalizationTransformer was used again to transform the z-scores back to the correct values.

I assume you didn't commit mistakes. But for caution, I put some of them here:

When you're training a model and predicting based on it, make sure that in both times the --model_dir parameter is the same.
Make sure that you store the different models in different directories, as specified in --model_dir.
The hyperparameter you used could be performing very poorly. That might be a reason for the bad performance.
Every time you change the data you use but using the same load_xxxx() function, you must make sure that the original directory corresponding to the old DiskDataset object is renamed or deleted, otherwise the old dataset would be reloaded. For example, if you have used davis dataset before for training and have a folder davis_data/GraphConv/ storing the data, and you're now using your own data for training but still using the load_davis() function, you must either remove the davis_data/GraphConv/ directory, or remove it, such that when executing this load function, the new dataset can be processed. The same goes for predicting. You must be VERY careful with it.
If anything goes wrong with load_nci60() function, tweak it as you wish. Note that the load_davis function that you will use will depend on the availability of the davis DiskDataset that you already created, so the load_nci60() function depends on the dataset you use for training the model. https://github.com/simonfqy/PADME/blob/d2d307fe17e1229add45f0c82bd50ed12bbfae35/dcCustom/molnet/load_function/nci60_dataset.py#L71

Will update this comment later, if I can think of other points to make.

Running-z commented 6 years ago

@simonfqy Ok, thank you for your patience. I have noticed which questions you have actually mentioned, but why the predictions are not accurate, I use your default parameters, and modify those hyperparameters according to my ideas, but the results are obtained. Almost the same difference, the prediction results are mostly negative, I feel that this is worse than the random prediction results, not very reason why the super parameters are not good, at the same time, I think the parameters of the model should also be customizable, such as GraphConvModel model Graph_conv_layers, etc. In addition, your load_nci60 () function is still useless, because I am still training the model, I hope you can use your new function to load data prediction after training.

simonfqy commented 6 years ago

@Running-z The GraphConvModel is already quite customizable, you can do it yourself if you want more flexibility. You can try using load_nci60() with tf_regression model, which takes a much shorter time to train and you should already have the trained model of it. What do you mean when you wrote "I hope you can use your new function to load data prediction after training"?

Running-z commented 6 years ago

@simonfqy Can GraphConvModel be customized? I don't seem to see the parameters of the model freely defined, such as the Graph_conv_layers parameter. The last question you said, sorry, I said wrong, I want to say that the load_nci60() function has not been used yet, because I am still training the model using the hyperparametric search method you provided. I hope that I am After training the model, I can use your load_nci60() function to load the data and predict it correctly.

simonfqy commented 6 years ago

I don't think it can be customized. Perhaps I could look at it over the weekend.

Running-z commented 6 years ago

@simonfqy Okay thank you

Running-z commented 6 years ago

@simonfqy Hello, I remember that you said before, during the training, because theNormalizationTransformer is used, the mean and standard deviation of the data are calculated, so I think the reason why my prediction result is completely different from the real value is probably the prediction. The result is not backcalculated using NormalizationTransformer. Maybe the NormalizationTransformer in your prediction code is useless. What do you think?

simonfqy commented 6 years ago

That's not right. I didn't change the code related to this functionality from the DeepChem. You should really look into the code carefully to figure out the problem. My response is that the results are transformed back in this line: https://github.com/simonfqy/PADME/blob/39dff90592f5142233ece5a95ebd95f1ef6e5649/dcCustom/models/tensorgraph/tensor_graph.py#L543

simonfqy commented 6 years ago

Closing again.

simonfqy / PADME

Can't get all the predictions #8