Use the hyper_param_search method to train the model and get the error

Running-z commented 5 years ago

I trained my data by using the method of hyper_param_search, but I got the following error:

Traceback (most recent call last):
  File "driver.py", line 696, in <module>
    tf.app.run(main=run_analysis, argv=[sys.argv[0]] + unparsed)
  File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "driver.py", line 468, in run_analysis
    pickle.dump(hyper_parameters, f)
TypeError: write() argument must be str, not bytes

My training script looks like this:

CUDA_VISIBLE_DEVICES=2
spec='python driver.py --dataset davis --hyper_param_search \
--max_iter 42 --prot_desc_path davis_data/prot_desc.csv --plot True \
--model_dir ./model_dir/  --split index --tensorboard True \
--arithmetic_mean --aggregate toxcast  \
--log_file GPhypersearch_t2.log \
--intermediate_file ./interm_files/intermediate_cv_warm_3.csv '
eval $spec

simonfqy commented 5 years ago

Hi, this is not my error, it is a problem with DeepChem itself. I found this problem, but as I store the hyperparameters found elsewhere (in the log_file), I didn't try to figure out how to solve this problem. It doesn't matter much to you either since it is the last line of that workhorse function, you can try to solve it or refer to the current version of DeepChem if you want. The results are stored in results_file which is defined here:

https://github.com/simonfqy/PADME/blob/d2d307fe17e1229add45f0c82bd50ed12bbfae35/driver.py#L357

When I was doing it, I just manually found the set of hyperparameters yielding the smallest score on log_file, which I think could be a bit undesirable if you have lots of iterations. You're welcome to contribute if you have any fix of the problem you reported.

To get an idea of how to run hyperparameter searching, look at the .sh scripts with _sch suffix, for example, ./drive4_d_sch.sh. You may not need the --no_concord parameter in it: it is just a parameter used to disable calculation of Concordance Index on training set to save some time, because in the original implementation it is very time-consuming to do so (quadratic time complexity), but it is no longer a problem, because I use the C implementation to calculate CI, which is super fast.

There is a problem with your .sh script. You don't need to use True here, the flag parameters like --tensorboard or --plot are False if you don't put them in the script, and set to True if you put them into the script. You may read my driver.py script to see their documentations.

I devised a score to be minimized, which is used to guide the hyperparameter search. The set of hyperparameters yielding the smallest score (compared to other iterations trained with other hyperparameters) throughout its training epochs is deemed as the best set of hyperparameters. The score is calculated on the validation set. If you enable --no_r2, it is mean(RMSE) - mean(CI); if you disable --no_r2, it is mean(RMSE) - mean(CI) - 0.5 * mean(R_square). Because the datasets are typically quite concentrated at the inactive values (except Metz dataset), R square calculated on validation set is not that useful, so I recommend you enable the --no_r2 parameter, like what was done in ./drive4_d_sch.sh.

The parameter --verbose_search was to tackle the problem of system crashes. Each iteration of hyperparameter search uses the same set of hyperparameters, by using --verbose_search, you'll log the current best scores of the current iteration into the log file whenever a new best score is found for the iteration, as the training progresses. If you don't enable it and your system crashes, you won't get any results logged for the iteration the computer is running at the time of system crash.

A last notice: if you're running a program that you don't know whether it works correctly, please construct a small toy example to see whether it hits any errors, like "sweeping the mines" on the road. In this case, you can set the nb_epoch to a very low number to save the time. Only run the full set once you've cleared the problems.

Hope this helps you.

Running-z commented 5 years ago

@simonfqy Ok, I will try your method, thank you very much.

simonfqy commented 5 years ago

Closed as of now.

simonfqy / PADME

Use the hyper_param_search method to train the model and get the error #9