Replicating the result - Githubissues

q8888620002 commented 4 years ago

Hi

Thanks for sharing the code! It's a great paper. I am currently trying to replicate the result but had a hard time running through the pipeline. I was able to finish some part of the hyperparameter tuning but was stuck in the following loop when I uncomment the optimal params in the specifications in script_rnn_fit.py. Not really sure if this is an issue with the GPU settings. (I am trying to replicate your result and compare with my model. Is there any suggestion that I can achieve this other than rerun the whole pipeline?)

2020-05-14 19:03:13.877817: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 2020-05-14 19:03:13.877830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N 2020-05-14 19:03:13.878008: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-05-14 19:03:13.878550: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-05-14 19:03:13.879045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7123 MB memory) -> physical GPU (device: 0, name: Tesla P4, pci bus id: 0000:00:04.0, compute capability: 6.1) WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.BasicLSTMCell object at 0x7fa9bc1c2b50>: Using a concatenated state is slower and will soon be deprecated. Use state_is_tuple=True. WARNING:<tensorflow.python.ops.rnn_cell_impl.BasicLSTMCell object at 0x7fa9bc1c2b50>: Using a concatenated state is slower and will soon be deprecated. Use state_is_tuple=True. WARNING:Combination found: skipping treatment_effects_0.2_4_100_128_0.01_0.5_60_tanh_sigmoid INFO:Using specifications for censor_rnn_action_inputs_only: (0.2, 2, 100, 128, 0.01, 0.5) 2020-05-14 19:03:13.948128: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-05-14 19:03:13.948651: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: Tesla P4 major: 6 minor: 1 memoryClockRate(GHz): 1.1135 pciBusID: 0000:00:04.0 2020-05-14 19:03:13.948783: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0 2020-05-14 19:03:13.948838: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0 2020-05-14 19:03:13.948870: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0 2020-05-14 19:03:13.948899: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0 2020-05-14 19:03:13.948926: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0 2020-05-14 19:03:13.948954: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0 2020-05-14 19:03:13.948981: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2020-05-14 19:03:13.949105: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-05-14 19:03:13.949684: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-05-14 19:03:13.950156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0 2020-05-14 19:03:13.950204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-05-14 19:03:13.950221: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 2020-05-14 19:03:13.950233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N 2020-05-14 19:03:13.950375: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-05-14 19:03:13.950946: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-05-14 19:03:13.951420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7123 MB memory) -> physical GPU (device: 0, name: Tesla P4, pci bus id: 0000:00:04.0, compute capability: 6.1) WARNING:tensorflow:<tensorflow.python.ops.rnn_cell_impl.BasicLSTMCell object at 0x7fa9c00c9590>: Using a concatenated state is slower and will soon be deprecated. Use state_is_tuple=True. WARNING:<tensorflow.python.ops.rnn_cell_impl.BasicLSTMCell object at 0x7fa9c00c9590>: Using a concatenated state is slower and will soon be deprecated. Use state_is_tuple=True. WARNING:Combination found: skipping treatment_effects_0.2_4_100_128_0.01_0.5_60_tanh_sigmoid

Thanks

sjblim commented 4 years ago

Hi there, thank you for your interest in our work. Unfortunately, it does seem that the most reliable way for replication would be to re-run the entire pipeline, rather than using the parameters left in the code (please also see #1 )

I'm afraid I can't seem to see any specific error messages in the logs you've shown. Please note that the code was written with an earlier version of tensorflow 1, and might require some modification if you're using TF 2.x.

q8888620002 commented 4 years ago

Hi @sjblim

Thanks for the response. It is the tf version that causes the problem. I am able to run the pipeline after the modification.

I still have some questions regarding the multisteps simulation settings:

According to table 1, the model seemed to be trained with training policy (coefficients are are set to 10 for both radio therapy and chemo therapy) and tested on 3 policies with different coefficients respectively (which considered as counterfactual policies) but, based on code script_rnn_test.py, it seems to me that the model is trained and tested on the cancer simulation data with the same chemo and radio coefficients.
How does the multistep simulation work? how many time steps (or sequence) are used to predict the next horizons (Tau). It's a bit unclear to me only by looking at the table1 in the paper.

Thanks again!

q8888620002 commented 4 years ago

Hi @sjblim

Sorry for bothering you again. I am currently running the multistep simulation with "rnn_propensity_weighted" but not entirely sure if this is the setting in the paper (since there're multiple settings of models e.g. cencor_rnn, treatment_rnn and etc.) Can you elaborate which one is the setting that you used to generate the result in the paper?

Thanks again!

sjblim / rmsn_nips_2018

Replicating the result #2