Closed Running-z closed 6 years ago
Again, try to add a phosphorylation
field right before A
, exactly following my prot_desc.csv
format. Ask me if you still have any problems after that.
And note that --dataset davis
parameter should be preserved. It would invoke the load_davis()
function defined in /molnet/load_function/davis_dataset.py
, which would in turn use the davis_data/restructured.csv
file to generate the data for further processing. You can (and perhaps should) customize the load functions like load_davis()
.
@simonfqy My data has been processed in the format of davis data.
But I still can't train, I got the following error:
Traceback (most recent call last):
File "driver.py", line 699, in <module>
tf.app.run(main=run_analysis, argv=[sys.argv[0]] + unparsed)
File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 124, in run
_sys.exit(main(argv))
File "driver.py", line 278, in run_analysis
prediction_file=csv_out)
File "/project/git2/PADME/dcCustom/molnet/run_benchmark_models.py", line 217, in model_regression
no_r2=no_r2)
File "/project/git2/PADME/dcCustom/models/tensorgraph/tensor_graph.py", line 178, in fit
max_checkpoints_to_keep, checkpoint_interval, restore, submodel)
File "/project/git2/PADME/dcCustom/models/tensorgraph/tensor_graph.py", line 378, in fit_generator
for feed_dict in self._create_feed_dicts(feed_dict_generator, True):
File "/project/git2/PADME/dcCustom/models/tensorgraph/tensor_graph.py", line 1107, in _create_feed_dicts
for d in generator:
File "/project/git2/PADME/dcCustom/models/tensorgraph/fcnet.py", line 331, in default_generator
pad_batches=pad_batches):
File "/project/git2/PADME/dcCustom/data/datasets.py", line 758, in iterate
next_shard = pool.apply_async(dataset.get_shard, (shard_perm[0],))
IndexError: index 0 is out of bounds for axis 0 with size 0
This is the content of the bash
file I am running:
CUDA_VISIBLE_DEVICES=0
spec='python driver.py --dataset davis \
--model tf_regression --prot_desc_path davis_data/Mer_psc2_Phosphorylated=0.csv \
--model_dir ./model_dir/model_dir4_davis_w --filter_threshold 1 \
--arithmetic_mean --aggregate toxcast \
--intermediate_file ./interm_files/intermediate_cv_warm_3.csv '
eval $spec
@Running-z Have you already stored the trained model in ./model_dir/model_dir4_davis_w
? Besides, when you're predicting, please remove the parameter --filter_threshold 1
, because that parameter is only used for training and validation, not for predicting.
@simonfqy No, I am not predicting, I am training my data, I am using my own data for training, I am wrong, not a prediction.
Hi, I guess the reason is that you used the parameter --filter_threshold 1
. This parameter will cause the program to remove any entities (compounds or proteins) that only appeared once in the whole dataset. As you only have 1 protein, all your compounds only appear once in the dataset, causing them to be removed completely from the dataset, rendering the dataset empty.
Please remove this parameter and try again.
@simonfqy
I tried the method you said again. I removed the --filter_threshold 1
parameter, but I can't use --split random
.otherwise I get the following error:
Traceback (most recent call last):
File "driver.py", line 699, in <module>
tf.app.run(main=run_analysis, argv=[sys.argv[0]] + unparsed)
File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 124, in run
_sys.exit(main(argv))
File "driver.py", line 172, in run_analysis
filter_threshold=filter_threshold)
File "/project/git2/PADME/dcCustom/molnet/load_function/davis_dataset.py", line 110, in load_davis
fold_datasets = splitter.k_fold_split(dataset, K)
File "/project/git2/PADME/dcCustom/splits/splitters.py", line 121, in k_fold_split
frac_test=0)
File "/project/git2/PADME/dcCustom/splits/splitters.py", line 844, in split
assert self.threshold > 0
AssertionError
Then I clone your latest code, run the dirve4_d_warm.sh file and get the following error:
Traceback (most recent call last):
File "driver.py", line 716, in <module>
tf.app.run(main=run_analysis, argv=[sys.argv[0]] + unparsed)
File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 124, in run
_sys.exit(main(argv))
File "driver.py", line 175, in run_analysis
filter_threshold=filter_threshold, oversampled=oversampled)
File "/project/git3/PADME/dcCustom/molnet/load_function/davis_dataset.py", line 109, in load_davis
fold_datasets = splitter.k_fold_split(dataset, K)
File "/project/git3/PADME/dcCustom/splits/splitters.py", line 122, in k_fold_split
frac_test=0)
File "/project/git3/PADME/dcCustom/splits/splitters.py", line 917, in split
assert len(entry_to_write) == 1
AssertionError
The content of drive4_d_warm.sh is:
CUDA_VISIBLE_DEVICES=5
spec='python driver.py --dataset davis --cross_validation
--model tf_regression --prot_desc_path davis_data/prot_desc.csv
--model_dir ./model_dir/model_dir4_davis_w --filter_threshold 1
--arithmetic_mean --aggregate toxcast --split_warm
--intermediate_file ./interm_files/intermediate_cv_warm_3.csv '
eval $spec
Then I removed the --filter_threshold 1 from drive4_d_warm.sh and I got the following error:
Traceback (most recent call last):
File "driver.py", line 716, in <module>
tf.app.run(main=run_analysis, argv=[sys.argv[0]] + unparsed)
File "/home/zh/anaconda3/envs/deep2.0.0/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 124, in run
_sys.exit(main(argv))
File "driver.py", line 175, in run_analysis
filter_threshold=filter_threshold, oversampled=oversampled)
File "/project/git3/PADME/dcCustom/molnet/load_function/davis_dataset.py", line 109, in load_davis
fold_datasets = splitter.k_fold_split(dataset, K)
File "/project/git3/PADME/dcCustom/splits/splitters.py", line 122, in k_fold_split
frac_test=0)
File "/project/git3/PADME/dcCustom/splits/splitters.py", line 865, in split
remain_this_mol_entries = mol_entries[molecule] - removed_entries
UnboundLocalError: local variable 'removed_entries' referenced before assignment
My data format is the same as davis data:
Excuse me, I’m exposing so many problems, but it’s really my confusion, I hope you guide.
@Running-z Hi, I assume that you have only one protein in your interaction file, is that right? If so, the split_warm
cannot work, because every compound only appear once in the whole dataset, there is no way to ensure that every compound appears in at least two cross-validation folds (the meaning of split_warm
). If that is the case, simply remove the split_warm
parameter in addition to the filter_threshold 1
parameter.
@simonfqy
In fact, I have 8 kinds of proteins, but not all molecules interact with these 8 proteins. The molecules of each target protein only interact with the corresponding proteins. There may be cases where all molecules appear only once, so I According to the method you said, the --filter_threshold 1
and --split_warm
parameters were removed, and then the training was modified to 1 training session, but I still got an unexpected error:
Traceback (most recent call last):
File "driver.py", line 696, in <module>
tf.app.run(main=run_analysis, argv=[sys.argv[0]] + unparsed)
File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "driver.py", line 378, in run_analysis
train_score = train_scores_list[h]
IndexError: list index out of range
At the same time, if I remove the --cross_validation
parameter, I will be able to complete the training.
@Running-z Glad to know that you were able to complete the training process. Don't know why there is list index out of range
error when --cross_validation
is enabled.
Please refer to this line of code: https://github.com/simonfqy/PADME/blob/5e97ba97f1389ea975b196a31b3464ca2cd00512/driver.py#L288
Previously I made an error in that line of code, I used range(1, fold_num)
instead of range(fold_num)
. It was initially a hard-coded temporary thing to run cross-validation folds in separate times, but I forgot to fix it after that task was done, and I accidentally uploaded it to GitHub. If your code uses range(1, fold_num)
, it could be the source of this problem. Many apologies.
According to the shell script you shared, at least part of your cross-validation results were saved in ./interm_files/intermediate_cv_warm_3.csv
. I saved them automatically after each fold to prevent system crashes destroying all your work before it can be output to result files.
@simonfqy
Ok, I saw the code, the code I used is indeed for h in range(1,fold_num)
, I will modify it and continue to try, thank you
Seems to be solved. Closed for now.
I want to train a
tf_regression
model with my own data. My data has 12628 molecules, 1 protein sequence,I changed the attribute fields in the data to be the same as the davis data you provided., and they look like this:Then I ran
drive4_d_warm.sh
, but I got the following error:Why is this happening? Is there any problem with my data? When extracting features, my data has all been removed. Why is this? What should I do?