Closed elulue closed 5 years ago
btw, if use tfdbg, I can see - "WARNING:tensorflow:Failed to load partition graphs for device /job:localhost/replica:0/task:0/device:CPU:0 from disk. As a fallback, the client graphs will be used. This may cause mismatches in device names."
I've found the root cause, tensorboard don't record compute time etc by default. I updated below and working for me -
if np.mod(global_step, show_every_n_step) == 1:
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
train_loss, train_mse, _, train_merged_sum = self.sess.run(
[self.loss, self.mse, self.optim, self.summary], train_data_feed,
options=run_options, run_metadata=run_metadata)
self.writer.add_run_metadata(run_metadata, 'step{}'.format(global_step))
self.writer.add_summary(train_merged_sum, global_step=global_step)
else:
train_loss, train_mse, _, train_merged_sum = self.sess.run(
[self.loss, self.mse, self.optim, self.summary], train_data_feed)
self.writer.add_summary(train_merged_sum, global_step=global_step)
Hi, Branch py3 working fine on my PC, I use ubuntu18.04, py3.5 and tensorflow 1.10 with singal video card Nvidia 1070. I see my GPU usage is around 30% while most video card memory been occupied during training. I'd like to see if there's room to improve the performance so goto tensorboard.
But the device is unknown when I check it in tensorboard->graph, also could not see compute time. Could you pls kindly let me know if any tip to fix it ? Thanks a lot.