Closed timothyjlaurent closed 4 years ago
Issue-Label Bot is automatically applying the labels:
Label | Probability |
---|---|
bug | 0.99 |
Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.
/assign @andreyvelich
@timothyjlaurent Thank you for the issue. Can you show logs from the Suggestion pod, please? Also, can you describe the training job, where you saw the logs.
Here are the logs from the suggestion pod:
klon kubeflow nas-rl-example-gpu-2-nasrl-7549486dbb-kptw5
/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
/usr/local/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
----------------------------------------------------------------------------------------------------
Setting Up Suggestion for Experiment nas-rl-example-gpu-2
----------------------------------------------------------------------------------------------------
>>> Search Space for Experiment nas-rl-example-gpu-2
Operation ID:
0
Operation Type:
convolution
Operations Parameters:
filter_size: 3
num_filter: 32
stride: 1
Operation ID:
1
Operation Type:
convolution
Operations Parameters:
filter_size: 3
num_filter: 32
stride: 2
Operation ID:
2
Operation Type:
convolution
Operations Parameters:
filter_size: 3
num_filter: 48
stride: 1
Operation ID:
3
Operation Type:
convolution
Operations Parameters:
filter_size: 3
num_filter: 48
stride: 2
Operation ID:
4
Operation Type:
convolution
Operations Parameters:
filter_size: 3
num_filter: 64
stride: 1
Operation ID:
5
Operation Type:
convolution
Operations Parameters:
filter_size: 3
num_filter: 64
stride: 2
Operation ID:
6
Operation Type:
convolution
Operations Parameters:
filter_size: 3
num_filter: 96
stride: 1
Operation ID:
7
Operation Type:
convolution
Operations Parameters:
filter_size: 3
num_filter: 96
stride: 2
Operation ID:
8
Operation Type:
convolution
Operations Parameters:
filter_size: 3
num_filter: 128
stride: 1
Operation ID:
9
Operation Type:
convolution
Operations Parameters:
filter_size: 3
num_filter: 128
stride: 2
Operation ID:
10
Operation Type:
convolution
Operations Parameters:
filter_size: 5
num_filter: 32
stride: 1
Operation ID:
11
Operation Type:
convolution
Operations Parameters:
filter_size: 5
num_filter: 32
stride: 2
Operation ID:
12
Operation Type:
convolution
Operations Parameters:
filter_size: 5
num_filter: 48
stride: 1
Operation ID:
13
Operation Type:
convolution
Operations Parameters:
filter_size: 5
num_filter: 48
stride: 2
Operation ID:
14
Operation Type:
convolution
Operations Parameters:
filter_size: 5
num_filter: 64
stride: 1
Operation ID:
15
Operation Type:
convolution
Operations Parameters:
filter_size: 5
num_filter: 64
stride: 2
Operation ID:
16
Operation Type:
convolution
Operations Parameters:
filter_size: 5
num_filter: 96
stride: 1
Operation ID:
17
Operation Type:
convolution
Operations Parameters:
filter_size: 5
num_filter: 96
stride: 2
Operation ID:
18
Operation Type:
convolution
Operations Parameters:
filter_size: 5
num_filter: 128
stride: 1
Operation ID:
19
Operation Type:
convolution
Operations Parameters:
filter_size: 5
num_filter: 128
stride: 2
Operation ID:
20
Operation Type:
convolution
Operations Parameters:
filter_size: 7
num_filter: 32
stride: 1
Operation ID:
21
Operation Type:
convolution
Operations Parameters:
filter_size: 7
num_filter: 32
stride: 2
Operation ID:
22
Operation Type:
convolution
Operations Parameters:
filter_size: 7
num_filter: 48
stride: 1
Operation ID:
23
Operation Type:
convolution
Operations Parameters:
filter_size: 7
num_filter: 48
stride: 2
Operation ID:
24
Operation Type:
convolution
Operations Parameters:
filter_size: 7
num_filter: 64
stride: 1
Operation ID:
25
Operation Type:
convolution
Operations Parameters:
filter_size: 7
num_filter: 64
stride: 2
Operation ID:
26
Operation Type:
convolution
Operations Parameters:
filter_size: 7
num_filter: 96
stride: 1
Operation ID:
27
Operation Type:
convolution
Operations Parameters:
filter_size: 7
num_filter: 96
stride: 2
Operation ID:
28
Operation Type:
convolution
Operations Parameters:
filter_size: 7
num_filter: 128
stride: 1
Operation ID:
29
Operation Type:
convolution
Operations Parameters:
filter_size: 7
num_filter: 128
stride: 2
Operation ID:
30
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 3
num_filter: 32
stride: 1
depth_multiplier: 1
Operation ID:
31
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 3
num_filter: 32
stride: 1
depth_multiplier: 2
Operation ID:
32
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 3
num_filter: 32
stride: 2
depth_multiplier: 1
Operation ID:
33
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 3
num_filter: 32
stride: 2
depth_multiplier: 2
Operation ID:
34
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 3
num_filter: 48
stride: 1
depth_multiplier: 1
Operation ID:
35
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 3
num_filter: 48
stride: 1
depth_multiplier: 2
Operation ID:
36
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 3
num_filter: 48
stride: 2
depth_multiplier: 1
Operation ID:
37
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 3
num_filter: 48
stride: 2
depth_multiplier: 2
Operation ID:
38
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 3
num_filter: 64
stride: 1
depth_multiplier: 1
Operation ID:
39
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 3
num_filter: 64
stride: 1
depth_multiplier: 2
Operation ID:
40
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 3
num_filter: 64
stride: 2
depth_multiplier: 1
Operation ID:
41
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 3
num_filter: 64
stride: 2
depth_multiplier: 2
Operation ID:
42
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 3
num_filter: 96
stride: 1
depth_multiplier: 1
Operation ID:
43
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 3
num_filter: 96
stride: 1
depth_multiplier: 2
Operation ID:
44
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 3
num_filter: 96
stride: 2
depth_multiplier: 1
Operation ID:
45
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 3
num_filter: 96
stride: 2
depth_multiplier: 2
Operation ID:
46
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 3
num_filter: 128
stride: 1
depth_multiplier: 1
Operation ID:
47
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 3
num_filter: 128
stride: 1
depth_multiplier: 2
Operation ID:
48
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 3
num_filter: 128
stride: 2
depth_multiplier: 1
Operation ID:
49
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 3
num_filter: 128
stride: 2
depth_multiplier: 2
Operation ID:
50
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 5
num_filter: 32
stride: 1
depth_multiplier: 1
Operation ID:
51
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 5
num_filter: 32
stride: 1
depth_multiplier: 2
Operation ID:
52
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 5
num_filter: 32
stride: 2
depth_multiplier: 1
Operation ID:
53
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 5
num_filter: 32
stride: 2
depth_multiplier: 2
Operation ID:
54
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 5
num_filter: 48
stride: 1
depth_multiplier: 1
Operation ID:
55
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 5
num_filter: 48
stride: 1
depth_multiplier: 2
Operation ID:
56
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 5
num_filter: 48
stride: 2
depth_multiplier: 1
Operation ID:
57
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 5
num_filter: 48
stride: 2
depth_multiplier: 2
Operation ID:
58
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 5
num_filter: 64
stride: 1
depth_multiplier: 1
Operation ID:
59
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 5
num_filter: 64
stride: 1
depth_multiplier: 2
Operation ID:
60
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 5
num_filter: 64
stride: 2
depth_multiplier: 1
Operation ID:
61
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 5
num_filter: 64
stride: 2
depth_multiplier: 2
Operation ID:
62
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 5
num_filter: 96
stride: 1
depth_multiplier: 1
Operation ID:
63
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 5
num_filter: 96
stride: 1
depth_multiplier: 2
Operation ID:
64
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 5
num_filter: 96
stride: 2
depth_multiplier: 1
Operation ID:
65
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 5
num_filter: 96
stride: 2
depth_multiplier: 2
Operation ID:
66
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 5
num_filter: 128
stride: 1
depth_multiplier: 1
Operation ID:
67
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 5
num_filter: 128
stride: 1
depth_multiplier: 2
Operation ID:
68
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 5
num_filter: 128
stride: 2
depth_multiplier: 1
Operation ID:
69
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 5
num_filter: 128
stride: 2
depth_multiplier: 2
Operation ID:
70
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 7
num_filter: 32
stride: 1
depth_multiplier: 1
Operation ID:
71
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 7
num_filter: 32
stride: 1
depth_multiplier: 2
Operation ID:
72
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 7
num_filter: 32
stride: 2
depth_multiplier: 1
Operation ID:
73
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 7
num_filter: 32
stride: 2
depth_multiplier: 2
Operation ID:
74
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 7
num_filter: 48
stride: 1
depth_multiplier: 1
Operation ID:
75
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 7
num_filter: 48
stride: 1
depth_multiplier: 2
Operation ID:
76
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 7
num_filter: 48
stride: 2
depth_multiplier: 1
Operation ID:
77
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 7
num_filter: 48
stride: 2
depth_multiplier: 2
Operation ID:
78
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 7
num_filter: 64
stride: 1
depth_multiplier: 1
Operation ID:
79
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 7
num_filter: 64
stride: 1
depth_multiplier: 2
Operation ID:
80
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 7
num_filter: 64
stride: 2
depth_multiplier: 1
Operation ID:
81
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 7
num_filter: 64
stride: 2
depth_multiplier: 2
Operation ID:
82
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 7
num_filter: 96
stride: 1
depth_multiplier: 1
Operation ID:
83
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 7
num_filter: 96
stride: 1
depth_multiplier: 2
Operation ID:
84
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 7
num_filter: 96
stride: 2
depth_multiplier: 1
Operation ID:
85
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 7
num_filter: 96
stride: 2
depth_multiplier: 2
Operation ID:
86
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 7
num_filter: 128
stride: 1
depth_multiplier: 1
Operation ID:
87
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 7
num_filter: 128
stride: 1
depth_multiplier: 2
Operation ID:
88
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 7
num_filter: 128
stride: 2
depth_multiplier: 1
Operation ID:
89
Operation Type:
separable_convolution
Operations Parameters:
filter_size: 7
num_filter: 128
stride: 2
depth_multiplier: 2
Operation ID:
90
Operation Type:
depthwise_convolution
Operations Parameters:
filter_size: 3
stride: 1
depth_multiplier: 1
Operation ID:
91
Operation Type:
depthwise_convolution
Operations Parameters:
filter_size: 3
stride: 1
depth_multiplier: 2
Operation ID:
92
Operation Type:
depthwise_convolution
Operations Parameters:
filter_size: 3
stride: 2
depth_multiplier: 1
Operation ID:
93
Operation Type:
depthwise_convolution
Operations Parameters:
filter_size: 3
stride: 2
depth_multiplier: 2
Operation ID:
94
Operation Type:
depthwise_convolution
Operations Parameters:
filter_size: 5
stride: 1
depth_multiplier: 1
Operation ID:
95
Operation Type:
depthwise_convolution
Operations Parameters:
filter_size: 5
stride: 1
depth_multiplier: 2
Operation ID:
96
Operation Type:
depthwise_convolution
Operations Parameters:
filter_size: 5
stride: 2
depth_multiplier: 1
Operation ID:
97
Operation Type:
depthwise_convolution
Operations Parameters:
filter_size: 5
stride: 2
depth_multiplier: 2
Operation ID:
98
Operation Type:
depthwise_convolution
Operations Parameters:
filter_size: 7
stride: 1
depth_multiplier: 1
Operation ID:
99
Operation Type:
depthwise_convolution
Operations Parameters:
filter_size: 7
stride: 1
depth_multiplier: 2
Operation ID:
100
Operation Type:
depthwise_convolution
Operations Parameters:
filter_size: 7
stride: 2
depth_multiplier: 1
Operation ID:
101
Operation Type:
depthwise_convolution
Operations Parameters:
filter_size: 7
stride: 2
depth_multiplier: 2
Operation ID:
102
Operation Type:
reduction
Operations Parameters:
reduction_type: max_pooling
pool_size: 2
Operation ID:
103
Operation Type:
reduction
Operations Parameters:
reduction_type: max_pooling
pool_size: 3
Operation ID:
104
Operation Type:
reduction
Operations Parameters:
reduction_type: avg_pooling
pool_size: 2
Operation ID:
105
Operation Type:
reduction
Operations Parameters:
reduction_type: avg_pooling
pool_size: 3
There are 106 operations in total.
>>> Parameters of LSTM Controller for Experiment nas-rl-example-gpu-2
lstm_num_cells: 64
lstm_num_layers: 1
lstm_keep_prob: 1.0
optimizer: adam
init_learning_rate: 0.001
lr_decay_start: 0
lr_decay_every: 1000
lr_decay_rate: 0.9
skip-target: 0.4
skip-weight: 0.8
l2_reg: 0.0
entropy_weight: 0.0001
baseline_decay: 0.9999
RequestNumber: 3
>>> Building Controller
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:68: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
--- Logging error ---
Traceback (most recent call last):
File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
msg = self.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
return fmt.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
s = self.formatMessage(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
return self._style.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
work_item.run()
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
argument, request_deserializer)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
response_or_iterator = behavior(argument, context)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 222, in GetSuggestions
experiment = NAS_RL_Experiment(request, self.logger)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 45, in __init__
self._setup_controller()
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 103, in _setup_controller
logger=self.logger)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 63, in __init__
self._create_params()
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 68, in _create_params
with tf.variable_scope(self.name, initializer=initializer):
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation_wrapper.py", line 119, in __getattr__
_call_location(), full_name, rename)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: The name %s is deprecated. Please use %s instead.\n'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:68', 'tf.variable_scope', 'tf.compat.v1.variable_scope')
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:73: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
--- Logging error ---
Traceback (most recent call last):
File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
msg = self.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
return fmt.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
s = self.formatMessage(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
return self._style.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
work_item.run()
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
argument, request_deserializer)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
response_or_iterator = behavior(argument, context)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 222, in GetSuggestions
experiment = NAS_RL_Experiment(request, self.logger)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 45, in __init__
self._setup_controller()
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 103, in _setup_controller
logger=self.logger)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 63, in __init__
self._create_params()
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 73, in _create_params
w = tf.get_variable("w", [2 * self.lstm_size, 4 * self.lstm_size])
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation_wrapper.py", line 119, in __getattr__
_call_location(), full_name, rename)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: The name %s is deprecated. Please use %s instead.\n'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:73', 'tf.get_variable', 'tf.compat.v1.get_variable')
>>> Building Controller Sampler
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:113: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.random.categorical` instead.
--- Logging error ---
Traceback (most recent call last):
File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
msg = self.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
return fmt.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
s = self.formatMessage(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
return self._style.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
work_item.run()
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
argument, request_deserializer)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
response_or_iterator = behavior(argument, context)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 222, in GetSuggestions
experiment = NAS_RL_Experiment(request, self.logger)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 45, in __init__
self._setup_controller()
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 103, in _setup_controller
logger=self.logger)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 64, in __init__
self._build_sampler()
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 113, in _build_sampler
operation_id = tf.multinomial(logit, 1)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 323, in new_func
instructions)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: %s (from %s) is deprecated and will be removed %s.\nInstructions for updating:\n%s'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:113', 'multinomial', 'tensorflow.python.ops.random_ops', 'in a future version', 'Use `tf.random.categorical` instead.')
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:114: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
--- Logging error ---
Traceback (most recent call last):
File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
msg = self.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
return fmt.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
s = self.formatMessage(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
return self._style.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
work_item.run()
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
argument, request_deserializer)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
response_or_iterator = behavior(argument, context)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 222, in GetSuggestions
experiment = NAS_RL_Experiment(request, self.logger)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 45, in __init__
self._setup_controller()
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 103, in _setup_controller
logger=self.logger)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 64, in __init__
self._build_sampler()
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 114, in _build_sampler
operation_id = tf.to_int32(operation_id)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 323, in new_func
instructions)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: %s (from %s) is deprecated and will be removed %s.\nInstructions for updating:\n%s'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:114', 'to_int32', 'tensorflow.python.ops.math_ops', 'in a future version', 'Use `tf.cast` instead.')
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:144: The name tf.log is deprecated. Please use tf.math.log instead.
--- Logging error ---
Traceback (most recent call last):
File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
msg = self.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
return fmt.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
s = self.formatMessage(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
return self._style.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
work_item.run()
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
argument, request_deserializer)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
response_or_iterator = behavior(argument, context)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 222, in GetSuggestions
experiment = NAS_RL_Experiment(request, self.logger)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 45, in __init__
self._setup_controller()
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 103, in _setup_controller
logger=self.logger)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 64, in __init__
self._build_sampler()
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 144, in _build_sampler
kl = skip_prob * tf.log(skip_prob / skip_targets)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation_wrapper.py", line 119, in __getattr__
_call_location(), full_name, rename)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: The name %s is deprecated. Please use %s instead.\n'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:144', 'tf.log', 'tf.math.log')
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:156: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
--- Logging error ---
Traceback (most recent call last):
File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
msg = self.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
return fmt.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
s = self.formatMessage(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
return self._style.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
work_item.run()
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
argument, request_deserializer)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
response_or_iterator = behavior(argument, context)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 222, in GetSuggestions
experiment = NAS_RL_Experiment(request, self.logger)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 45, in __init__
self._setup_controller()
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 103, in _setup_controller
logger=self.logger)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 64, in __init__
self._build_sampler()
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 156, in _build_sampler
skip = tf.to_float(skip)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 323, in new_func
instructions)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: %s (from %s) is deprecated and will be removed %s.\nInstructions for updating:\n%s'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:156', 'to_float', 'tensorflow.python.ops.math_ops', 'in a future version', 'Use `tf.cast` instead.')
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:183: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
--- Logging error ---
Traceback (most recent call last):
File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
msg = self.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
return fmt.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
s = self.formatMessage(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
return self._style.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
work_item.run()
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
argument, request_deserializer)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
response_or_iterator = behavior(argument, context)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 222, in GetSuggestions
experiment = NAS_RL_Experiment(request, self.logger)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 45, in __init__
self._setup_controller()
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 105, in _setup_controller
self.controller.build_trainer()
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 183, in build_trainer
self.reward = tf.placeholder(tf.float32, shape=())
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation_wrapper.py", line 119, in __getattr__
_call_location(), full_name, rename)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: The name %s is deprecated. Please use %s instead.\n'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:183', 'tf.placeholder', 'tf.compat.v1.placeholder')
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:193: The name tf.assign_sub is deprecated. Please use tf.compat.v1.assign_sub instead.
--- Logging error ---
Traceback (most recent call last):
File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
msg = self.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
return fmt.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
s = self.formatMessage(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
return self._style.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
work_item.run()
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
argument, request_deserializer)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
response_or_iterator = behavior(argument, context)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 222, in GetSuggestions
experiment = NAS_RL_Experiment(request, self.logger)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 45, in __init__
self._setup_controller()
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 105, in _setup_controller
self.controller.build_trainer()
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 193, in build_trainer
baseline_update = tf.assign_sub(self.baseline, (1 - self.bl_dec) * (self.baseline - self.reward))
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation_wrapper.py", line 119, in __getattr__
_call_location(), full_name, rename)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: The name %s is deprecated. Please use %s instead.\n'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:193', 'tf.assign_sub', 'tf.compat.v1.assign_sub')
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Trainer.py:102: The name tf.train.exponential_decay is deprecated. Please use tf.compat.v1.train.exponential_decay instead.
--- Logging error ---
Traceback (most recent call last):
File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
msg = self.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
return fmt.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
s = self.formatMessage(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
return self._style.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
work_item.run()
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
argument, request_deserializer)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
response_or_iterator = behavior(argument, context)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 222, in GetSuggestions
experiment = NAS_RL_Experiment(request, self.logger)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 45, in __init__
self._setup_controller()
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 105, in _setup_controller
self.controller.build_trainer()
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 219, in build_trainer
num_replicas=self.num_replicas)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Trainer.py", line 102, in get_train_ops
learning_rate = tf.train.exponential_decay(
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation_wrapper.py", line 119, in __getattr__
_call_location(), full_name, rename)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: The name %s is deprecated. Please use %s instead.\n'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Trainer.py:102', 'tf.train.exponential_decay', 'tf.compat.v1.train.exponential_decay')
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Trainer.py:118: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
--- Logging error ---
Traceback (most recent call last):
File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
msg = self.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
return fmt.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
s = self.formatMessage(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
return self._style.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
work_item.run()
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
argument, request_deserializer)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
response_or_iterator = behavior(argument, context)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 222, in GetSuggestions
experiment = NAS_RL_Experiment(request, self.logger)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 45, in __init__
self._setup_controller()
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 105, in _setup_controller
self.controller.build_trainer()
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 219, in build_trainer
num_replicas=self.num_replicas)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Trainer.py", line 118, in get_train_ops
opt = tf.train.AdamOptimizer(learning_rate, beta1=0.0, epsilon=1e-3,
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation_wrapper.py", line 119, in __getattr__
_call_location(), full_name, rename)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: The name %s is deprecated. Please use %s instead.\n'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Trainer.py:118', 'tf.train.AdamOptimizer', 'tf.compat.v1.train.AdamOptimizer')
>>> Suggestion for Experiment nas-rl-example-gpu-2 has been initialized.
----------------------------------------------------------------------------------------------------
Suggestion Step 0 for Experiment nas-rl-example-gpu-2
----------------------------------------------------------------------------------------------------
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py:227: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.
--- Logging error ---
Traceback (most recent call last):
File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
msg = self.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
return fmt.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
s = self.formatMessage(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
return self._style.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
work_item.run()
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
argument, request_deserializer)
File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
response_or_iterator = behavior(argument, context)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 227, in GetSuggestions
saver = tf.train.Saver()
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation_wrapper.py", line 119, in __getattr__
_call_location(), full_name, rename)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: The name %s is deprecated. Please use %s instead.\n'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py:227', 'tf.train.Saver', 'tf.compat.v1.train.Saver')
>>> First time running suggestion for nas-rl-example-gpu-2. Random architecture will be given.
2020-02-04 00:21:56.458811: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-02-04 00:21:56.463174: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2999995000 Hz
2020-02-04 00:21:56.463653: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x563b4d4c3d20 executing computations on platform Host. Devices:
2020-02-04 00:21:56.463681: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2020-02-04 00:21:56.538853: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
>>> New Neural Network Architecture Candidate #0 (internal representation):
[[78], [43, 0], [81, 0, 1], [2, 0, 1, 0], [67, 1, 0, 0, 0], [63, 0, 0, 1, 1, 0], [99, 1, 0, 1, 0, 0, 0], [66, 1, 1, 0, 0, 1, 0, 1]]
>>> Corresponding Seach Space Description:
{'num_layers': 8, 'input_sizes': [32, 32, 3], 'output_sizes': [10], 'embedding': {'78': {'opt_id': 78, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '7', 'num_filter': '64', 'stride': '1', 'depth_multiplier': '1'}}, '43': {'opt_id': 43, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '3', 'num_filter': '96', 'stride': '1', 'depth_multiplier': '2'}}, '81': {'opt_id': 81, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '7', 'num_filter': '64', 'stride': '2', 'depth_multiplier': '2'}}, '2': {'opt_id': 2, 'opt_type': 'convolution', 'opt_params': {'filter_size': '3', 'num_filter': '48', 'stride': '1'}}, '67': {'opt_id': 67, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '5', 'num_filter': '128', 'stride': '1', 'depth_multiplier': '2'}}, '63': {'opt_id': 63, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '5', 'num_filter': '96', 'stride': '1', 'depth_multiplier': '2'}}, '99': {'opt_id': 99, 'opt_type': 'depthwise_convolution', 'opt_params': {'filter_size': '7', 'stride': '1', 'depth_multiplier': '2'}}, '66': {'opt_id': 66, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '5', 'num_filter': '128', 'stride': '1', 'depth_multiplier': '1'}}}}
>>> New Neural Network Architecture Candidate #1 (internal representation):
[[77], [95, 0], [7, 1, 1], [67, 1, 0, 0], [17, 1, 1, 0, 0], [82, 1, 1, 1, 0, 0], [41, 0, 1, 0, 0, 0, 0], [39, 1, 1, 1, 1, 0, 1, 0]]
>>> Corresponding Seach Space Description:
{'num_layers': 8, 'input_sizes': [32, 32, 3], 'output_sizes': [10], 'embedding': {'77': {'opt_id': 77, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '7', 'num_filter': '48', 'stride': '2', 'depth_multiplier': '2'}}, '95': {'opt_id': 95, 'opt_type': 'depthwise_convolution', 'opt_params': {'filter_size': '5', 'stride': '1', 'depth_multiplier': '2'}}, '7': {'opt_id': 7, 'opt_type': 'convolution', 'opt_params': {'filter_size': '3', 'num_filter': '96', 'stride': '2'}}, '67': {'opt_id': 67, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '5', 'num_filter': '128', 'stride': '1', 'depth_multiplier': '2'}}, '17': {'opt_id': 17, 'opt_type': 'convolution', 'opt_params': {'filter_size': '5', 'num_filter': '96', 'stride': '2'}}, '82': {'opt_id': 82, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '7', 'num_filter': '96', 'stride': '1', 'depth_multiplier': '1'}}, '41': {'opt_id': 41, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '3', 'num_filter': '64', 'stride': '2', 'depth_multiplier': '2'}}, '39': {'opt_id': 39, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '3', 'num_filter': '64', 'stride': '1', 'depth_multiplier': '2'}}}}
>>> New Neural Network Architecture Candidate #2 (internal representation):
[[33], [19, 1], [33, 1, 0], [53, 0, 1, 0], [4, 1, 0, 0, 1], [96, 1, 0, 0, 0, 1], [63, 0, 0, 0, 1, 1, 1], [76, 1, 1, 1, 1, 0, 1, 0]]
>>> Corresponding Seach Space Description:
{'num_layers': 8, 'input_sizes': [32, 32, 3], 'output_sizes': [10], 'embedding': {'33': {'opt_id': 33, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '3', 'num_filter': '32', 'stride': '2', 'depth_multiplier': '2'}}, '19': {'opt_id': 19, 'opt_type': 'convolution', 'opt_params': {'filter_size': '5', 'num_filter': '128', 'stride': '2'}}, '53': {'opt_id': 53, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '5', 'num_filter': '32', 'stride': '2', 'depth_multiplier': '2'}}, '4': {'opt_id': 4, 'opt_type': 'convolution', 'opt_params': {'filter_size': '3', 'num_filter': '64', 'stride': '1'}}, '96': {'opt_id': 96, 'opt_type': 'depthwise_convolution', 'opt_params': {'filter_size': '5', 'stride': '2', 'depth_multiplier': '1'}}, '63': {'opt_id': 63, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '5', 'num_filter': '96', 'stride': '1', 'depth_multiplier': '2'}}, '76': {'opt_id': 76, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '7', 'num_filter': '48', 'stride': '2', 'depth_multiplier': '1'}}}}
>>> 3 Trials were created for Experiment nas-rl-example-gpu-2
Here's the description of the pod:
Name: nas-rl-example-gpu-2-qfpqcqbp-jjg98
Namespace: kubeflow
Priority: 0
Node: ip-10-71-72-149.ec2.internal/10.71.72.149
Start Time: Mon, 03 Feb 2020 16:30:22 -0800
Labels: controller-uid=5a57b00f-46e4-11ea-a653-0a392894a425
job-name=nas-rl-example-gpu-2-qfpqcqbp
Annotations: <none>
Status: Failed
IP: 10.42.93.12
IPs: <none>
Controlled By: Job/nas-rl-example-gpu-2-qfpqcqbp
Containers:
nas-rl-example-gpu-2-qfpqcqbp:
Container ID: docker://602f8a69cfcf7add633c2cc2b8ffdf9049050898ca7d94a351ccee5224bd297d
Image: docker.io/kubeflowkatib/nasrl-cifar10-gpu
Image ID: docker-pullable://kubeflowkatib/nasrl-cifar10-gpu@sha256:b2039193018df3fd7650ad257531621704589c472748f64acfb617fab468a212
Port: <none>
Host Port: <none>
Command:
python3.5
-u
RunTrial.py
--architecture="[[77, [95, 0, [7, 1, 1, [67, 1, 0, 0, [17, 1, 1, 0, 0, [82, 1, 1, 1, 0, 0, [41, 0, 1, 0, 0, 0, 0, [39, 1, 1, 1, 1, 0, 1, 0]"
--nn_config="{'num_layers': 8, 'input_sizes': [32, 32, 3, 'output_sizes': [10, 'embedding': {'77': {'opt_id': 77, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '7', 'num_filter': '48', 'stride': '2', 'depth_multiplier': '2'}}, '95': {'opt_id': 95, 'opt_type': 'depthwise_convolution', 'opt_params': {'filter_size': '5', 'stride': '1', 'depth_multiplier': '2'}}, '7': {'opt_id': 7, 'opt_type': 'convolution', 'opt_params': {'filter_size': '3', 'num_filter': '96', 'stride': '2'}}, '67': {'opt_id': 67, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '5', 'num_filter': '128', 'stride': '1', 'depth_multiplier': '2'}}, '17': {'opt_id': 17, 'opt_type': 'convolution', 'opt_params': {'filter_size': '5', 'num_filter': '96', 'stride': '2'}}, '82': {'opt_id': 82, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '7', 'num_filter': '96', 'stride': '1', 'depth_multiplier': '1'}}, '41': {'opt_id': 41, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '3', 'num_filter': '64', 'stride': '2', 'depth_multiplier': '2'}}, '39': {'opt_id': 39, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '3', 'num_filter': '64', 'stride': '1', 'depth_multiplier': '2'}}}}"
State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 03 Feb 2020 16:30:24 -0800
Finished: Mon, 03 Feb 2020 16:30:26 -0800
Ready: False
Restart Count: 0
Limits:
nvidia.com/gpu: 1
Requests:
nvidia.com/gpu: 1
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-fpv72 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-fpv72:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-fpv72
Optional: false
QoS Class: BestEffort
Node-Selectors: group-name=gpu-worker-group-1
Tolerations: dedicated=gpu-worker-group-1:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m47s default-scheduler Successfully assigned kubeflow/nas-rl-example-gpu-2-qfpqcqbp-jjg98 to ip-10-71-72-149.ec2.internal
Normal Pulling 3m46s kubelet, ip-10-71-72-149.ec2.internal Pulling image "docker.io/kubeflowkatib/nasrl-cifar10-gpu"
Normal Pulled 3m46s kubelet, ip-10-71-72-149.ec2.internal Successfully pulled image "docker.io/kubeflowkatib/nasrl-cifar10-gpu"
Normal Created 3m46s kubelet, ip-10-71-72-149.ec2.internal Created container nas-rl-example-gpu-2-qfpqcqbp
Normal Started 3m45s kubelet, ip-10-71-72-149.ec2.internal Started container nas-rl-example-gpu-2-qfpqcqbp
Interestingly, I can get the Experiment to run if I add
metricsCollectorSpec:
collector:
kind: StdOut
to the example yaml.
I think you are using not the latest image for the NAS Suggestion. I can see some old warning from the logs.
Can you check what image are you using in the katib-config configmap
for the nasrl
Suggestion?
"image": "gcr.io/kubeflow-images-public/katib/v1alpha3/suggestion-nasrl:v0.7.0"
"image": "gcr.io/kubeflow-images-public/katib/v1alpha3/suggestion-nasrl:v0.7.0"
Can you try to edit this image to the latests -
gcr.io/kubeflow-images-public/katib/v1alpha3/suggestion-nasrl
and run this experiment again?
updating to the latest image works. I see that the metricsCollectorSpec gets added to the Experiment.
perhaps the images link needs to be updated in the kustomize?
perhaps the images link needs to be updated in the kustomize?
I need to fix some small bugs in Katib UI, after that we will update Kustomize images.
Sounds good, I'll close this, thanks for the help @andreyvelich
/kind bug
What steps did you take and what happened: When running the example Experiment manifest, nasjob-example-RL-gpu.yaml, the Job encounters an error.
What did you expect to happen: Not error, run to completion
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment:
kubectl version
): 0.14.6/etc/os-release
):