Closed rmaanyam closed 5 years ago
Hi, thx for your comment.
The package is available for running with real data set.
I have added extra descriptions of doing deep cox model on real data: TFDeepSurv#42-runing-with-real-data.
Feel free to communicate with us if there is any problem !
Cool, thankQ - that wud be very helpful... sure, will check out and get back, as needed.
Moved forward with load_data module - but encountered an optimization error with sample data - per below. If you have any clue about this error, pls advise. [Also, with all the the 4.1 steps, simulated data resulted lower ci, for example: training steps 2401, loss = 6.88694, CI = 0.681214. Not sure, why? Any clues, please advise, appreciate ur help]..Perhaps, below error is abt version issue -will check if tf version-update helps...
model.train(num_epoch=2500, iteration=100, ... plot_train_loss=True, plot_train_ci=True) time_stamp.../tensorflow/core/grappler/optimizers/graph_optimizer_stage.h:237] Failed to run optimizer ArithmeticOptimizer, stage HoistCommonFactor. Error: Node ArithmeticOptimizer/AddOpsRewrite_add_543 is missing output properties at position :0 (num_outputs=0)
Thx for your reply!
As you mentioned above, the error below occurred. The main reason may be that you used an early version of the tfdeepsurv
package, so pls download the latest version, reinstall and then check it.
TypeError: load_data() got an unexpected keyword argument 'excluded_col'.
Okay, it's great! seems that you have fixed the type error.
If you get lower ci with simulated data, pls check whether the arguments you pass to dsnn
is the same with #4.1. And I will try to running with simulated data again, and then check it.
Also, an optimization error with sample data may be due to missing the statement of output_nodes = 1
. Since I don't know your code for running dsnn
with sample data, so I can only to guess the reason with the help of error message.
Oh, I have run the code in #4.1 with simulated data, and get the same result with u!
I'm sry! I think that I must be set a different random seed instead of a default seed, but it is not represented in #4.1.
But it is not the key point! In a general setting, if a lower metric occurs, we often to do hyper-parameters tuning when running DSNN
with a new dataset.
sry, closed by mistake:-)
Thank you so much for your help/feedback. Its very helpful. You may pls look thru below code/output for reference, and see if anything missing...
from tfdeepsurv import dsl from tfdeepsurv.utils import load_data train_X, train_y, test_X, test_y = load_data('data_1.csv', excluded_col=['ID'], surv_col={'e': 'event', 't': 'time'}, split_ratio=0.8) Number of rows: 5417 X cols: 10 Y cols: 2 X.column name: Index(['Var1', 'Var2', 'Var3', 'Var4', 'Var5', 'Var6', 'Var7', 'Var8', 'Var9', 'Var10'], dtype='object') Y.column name: Index(['time', 'event'], dtype='object') input_nodes = 10 output_nodes = 1 model = dsl.dsnn(train_X, train_y, input_nodes, [6, 3], output_nodes, learning_rate=0.2, learning_rate_decay=1.0, activation='relu', L1_reg=0.0002, L2_reg=0.0003, optimizer='adam', dropout_keep_prob=1.0) print(model.get_ties()) efron model.train(num_epoch=1000, iteration=100, plot_train_loss=True, plot_train_ci=True) 2019-02-02 18:41:36.554331: W ./tensorflow/core/grappler/optimizers/graph_optimizer_stage.h:237] Failed to run optimizer ArithmeticOptimizer, stage HoistCommonFactor. Error: Node ArithmeticOptimizer/AddOpsRewrite_add_543 is missing output properties at position :0 (num_outputs=0)
training steps 1: loss = 8.55276.
CI = 0.490029.
training steps 101: loss = 8.55249.
CI = 0.571747.
training steps 201: loss = 8.55219.
CI = 0.579078. ........................
training steps 801: loss = 8.46366.
CI = 0.615219.
training steps 901: loss = 8.46141.
CI = 0.616758.
My observations:
Questions for you
Do you think the above optimizer error has any impact on lower 'ci' values [although, the code ran with default hyper parameters]?
Now, the key point is - I wud need to run 'hypopt' for training the model with the above data set (or any new one) to get good parameters and use those values in the code bloc [ model = dsl.dsnn (train_X, train_Y....], right? So, for this task, can you kindly guide me thru necessary steps to run, pls?
You may list-down the steps/details, pls [newbie to this hypopt:) - or u can email me at 'rmanyam at student.gsu.edu' ..or u may list-down steps in the notebooks repository folder or so, if that works better. Hope you can look thru this request when u get a chance. Thank you so much, appreciate ur help...
Reply to your observations:
Since the result produced, even after the optimizer error, so the issue with tensorflow
version may be needed to check. And also, I will first deal with the optimizer error, then answer the first one of Questions for me. (By the way, the tensorflow
version on my pc is 1.4.0
. What about you ?)
I will reinstall a new version of tensorflow
, and retest all functions in this package.
About hyper-parameters tuning. Good suggestion! For hyper-parameters tuning on simulated data, I will list-down the steps/details in [bysopt]. (https://github.com/liupei101/TFDeepSurv/blob/master/bysopt/README.md)
@rmaanyam My schedule is pretty tight during this month. The To-do list mentioned above may be done in an idle time period. If you want to speed things up, then you could read the implementation in hpopt.py yourself and try something similar.
Sure, thank you. Let me check on the 1) tensor flow version on my pc too and upgrade, as needed and will then double-check if those steps help in improving performance. 2) will look thru bysopt at the link u mentioned and go from there. 3) I will also look thru the hpopt.py.... Sure, np - whenever u get a chance (early next month or as soon as u can, pls), any guidance in this regard is very helpful, please. Thank you so much...
The package TFDeepSurv
has been updated.
Fix or add:
TFDeepSurv
: tutorialRunning with simulated data
. The argument of dsl.dsnn
is overridden by a new set of values obtained by BHO (Bayesian Hyperparameters Optimization), and the sequential results are also updated.A comprehensive test has been done on my PC under the environment described in README.
Feel free to contact me if there is something wrong!
@rmaanyam By the way, would u mind to star or fork this repo so as to let it known to more people? ^_^
Thx!
Sounds great, thank you so much, liupei101! Will be checking/testing soon for sure. Wow, I think, its great update and should be very helpful to whoever wants to use hyper-parameter tuning/optimization piece, so as to obtain better prediction accuracies...
Hi liupei101, Good news, above update on 'simulated data' piece has been tested successfully and the results exactly match with that of section 4.1 - as you mentioned. Will check with real data set soon and get back if any issues...
Hi liupei101, Good news, got it working with real-dataset too. May need a bit of hyper-parameter tuning now -can go ahead and follow guidelines at bysopt - as you mentioned.
By the way, a question on plots for you - how about putting the curves of both Training and validation data sets on one plot itself?
For instance, both training and validation C-index on one plot and both train and valid losses on another -so as to make it easy for comparison purposes, like you have it for survival function plot here. Or for example, like they have it in DeepSurv here
I'm talking about, a sample output images like below ones...hope you got it.I think, we need to add/update a plot-function in utils or vision.py module. Appreciate your thoughts/help on this. Please let me know if you have any questions for me.
Hi liupei101, Is there anyway we can plot both train and valid CIs on one graph - suggestions please? Can you pls take a look when u get chance...
Get it! Sry! My schedule is tight in recent days. I will realize the plot function you mentioned. Maybe done in two days.
Hi Liupei101,
This package looks great. Wondering if we cud play/test this package with a new sample data set. If ok, can you please guide thru initial steps to start with - newbie pls..thanks...or wud like to discuss a couple of questions - u may pls advise about the best way to contact/communicate, thanq so much, appreciate ur time/help...