Open ahundt opened 6 years ago
I tried running micro search on TF 1.7 and it made quite a bit of progress, up to 150 epochs, but then it failed out as follows:
[1 2 1 1 1 3 0 2 2 0 1 1 1 1 1 4 1 4 1 4] val_acc=0.7750 -------------------------------------------------------------------------------- [0 0 1 0 0 4 0 1 0 4 1 1 1 4 0 1 0 1 5 2] [0 1 1 0 1 1 1 0 1 2 1 3 1 0 3 3 1 0 2 4] val_acc=0.6813 -------------------------------------------------------------------------------- [0 1 1 0 0 0 0 0 0 0 1 1 4 0 0 0 0 0 1 1] [1 0 1 2 1 1 1 1 1 0 1 3 3 0 2 0 1 0 1 1] val_acc=0.7312 -------------------------------------------------------------------------------- [0 1 0 4 0 0 0 2 1 0 1 3 1 0 3 0 1 1 1 1] [1 0 1 0 1 1 1 1 1 4 1 1 1 1 1 0 3 4 1 4] val_acc=0.7188 -------------------------------------------------------------------------------- [0 0 0 2 1 0 1 0 1 4 0 3 0 1 1 0 0 1 4 2] [0 4 1 1 1 4 1 1 1 1 1 0 1 0 1 2 1 1 1 2] val_acc=0.7250 -------------------------------------------------------------------------------- Epoch 150: Eval Eval at 42300 valid_accuracy: 0.6946 Eval at 42300 test_accuracy: 0.6842 Exception in thread QueueRunnerThread-dummy_queue-sync_token_q_EnqueueMany: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "/home/ahundt/.local/lib/python2.7/site-packages/tensorflow/python/training/queue_runner_impl.py", line 268, in _run coord.request_stop(e) File "/home/ahundt/.local/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 213, in request_stop six.reraise(*sys.exc_info()) File "/home/ahundt/.local/lib/python2.7/site-packages/tensorflow/python/training/queue_runner_impl.py", line 252, in _run enqueue_callable() File "/home/ahundt/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1249, in _single_operation_run self._call_tf_sessionrun(None, {}, [], target_list, None) File "/home/ahundt/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun status, run_metadata) File "/home/ahundt/.local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__ c_api.TF_GetCode(self.status.status)) CancelledError: TakeGrad operation was cancelled [[Node: sync_replicas/AccumulatorTakeGradient = AccumulatorTakeGradient[_class=["loc:@sync_replicas/conditional_accumulator"], dtype=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](sync_replicas/conditional_accumulator, sync_replicas/AccumulatorTakeGradient/num_required)]] [[Node: sync_replicas/AccumulatorTakeGradient_2/_16859 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_93_sync_replicas/AccumulatorTakeGradient_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
I didn't take any steps to cancel it like hitting ctrl+c so I'm not sure why this is occurring.
did you solve this issue ? I have the same Error a the end of my search
I think so, look at the pull request I made
https://github.com/melodyguan/enas/pull/29
I tried running micro search on TF 1.7 and it made quite a bit of progress, up to 150 epochs, but then it failed out as follows:
I didn't take any steps to cancel it like hitting ctrl+c so I'm not sure why this is occurring.