kubeflow / katib

Automated Machine Learning on Kubernetes
https://www.kubeflow.org/docs/components/katib
Apache License 2.0
1.51k stars 442 forks source link

running nas example fails #1041

Closed timothyjlaurent closed 4 years ago

timothyjlaurent commented 4 years ago

/kind bug

What steps did you take and what happened: When running the example Experiment manifest, nasjob-example-RL-gpu.yaml, the Job encounters an error.

>>> arch received by trial                                                                                                                                                                                                  "[[102], [21, 1], [55, 0, 0], [15, 0, 1, 1], [41, 0, 0, 1, 0], [64, 0, 0, 1, 1, 1], [39, 0, 0, 1, 0, 0, 1], [100, 1, 1, 0, 1, 0, 1, 0]]"
>>> nn_config received by trial                                                                                                                                                                                             
"{"num_layers": 8, "input_sizes": [32, 32, 3], "output_sizes": [10], "embedding": {"102": {"opt_id": 102, "opt_type": "reduction", "opt_params": {"reduction_type": "max_pooling", "pool_size": 2}}, "21": {"opt_id": 21, "o
pt_type": "convolution", "opt_params": {"filter_size": "7", "num_filter": "32", "stride": "2"}}, "55": {"opt_id": 55, "opt_type": "separable_convolution", "opt_params": {"filter_size": "5", "num_filter": "48", "stride": "1", "depth_multiplier": "2"}}, "15": {"opt_id": 15, "opt_type": "convolution", "opt_params": {"filter_size": "5", "num_filter": "64", "stride": "2"}}, "41": {"opt_id": 41, "opt_type": "separable_convolution", "opt_param
s": {"filter_size": "3", "num_filter": "64", "stride": "2", "depth_multiplier": "2"}}, "64": {"opt_id": 64, "opt_type": "separable_convolution", "opt_params": {"filter_size": "5", "num_filter": "96", "stride": "2", "depth_multiplier": "1"}}, "39": {"opt_id": 39, "opt_type": "separable_convolution", "opt_params": {"filter_size": "3", "num_filter": "64", "stride": "1", "depth_multiplier": "2"}}, "100": {"opt_id": 100, "opt_type": "depthwise_convolution", "opt_params": {"filter_size": "7", "stride": "2", "depth_multiplier": "1"}}}}"                                                                                                                             

>>> num_epochs received by trial                                                                                                                                                                                            
10                                                                                                                                                                                                                          
>>> num_gpus received by trial:                                                                                                                                                                                             
1                                                                                                                                                                                                                           
>>> Constructing Model...                                                                                                                                                                                                   

Traceback (most recent call last):                                                                                                                                                                                            

File "RunTrial.py", line 40, in <module>                                                                                                                                                                                      
constructor = ModelConstructor(arch, nn_config)                                                                                                                                                                           
File "/usr/src/app/github.com/kubeflow/katib/examples/v1alpha3/NAS-training-containers/RL-
cifar10/ModelConstructor.py", line 13, in __init__                                                                                  

nn_config = json.loads(nn_json)                                                                                                                                                                                           

File "/usr/lib/python3.5/json/__init__.py", line 319, in loads                                                                                                                                                                

return _default_decoder.decode(s)                                                                                                                                                                                         
File "/usr/lib/python3.5/json/decoder.py", line 342, in decode                                                                                                                                                                

raise JSONDecodeError("Extra data", s, end)                                                                                                                                                                             

json.decoder.JSONDecodeError: Extra data: line 1 column 4 (char 3)

What did you expect to happen: Not error, run to completion

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

issue-label-bot[bot] commented 4 years ago

Issue-Label Bot is automatically applying the labels:

Label Probability
bug 0.99

Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.

johnugeorge commented 4 years ago

/assign @andreyvelich

andreyvelich commented 4 years ago

@timothyjlaurent Thank you for the issue. Can you show logs from the Suggestion pod, please? Also, can you describe the training job, where you saw the logs.

timothyjlaurent commented 4 years ago

Here are the logs from the suggestion pod:

klon kubeflow nas-rl-example-gpu-2-nasrl-7549486dbb-kptw5
/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/usr/local/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
----------------------------------------------------------------------------------------------------
Setting Up Suggestion for Experiment nas-rl-example-gpu-2
----------------------------------------------------------------------------------------------------
>>> Search Space for Experiment nas-rl-example-gpu-2
Operation ID:
    0
Operation Type:
    convolution
Operations Parameters:
    filter_size: 3
    num_filter: 32
    stride: 1

Operation ID:
    1
Operation Type:
    convolution
Operations Parameters:
    filter_size: 3
    num_filter: 32
    stride: 2

Operation ID:
    2
Operation Type:
    convolution
Operations Parameters:
    filter_size: 3
    num_filter: 48
    stride: 1

Operation ID:
    3
Operation Type:
    convolution
Operations Parameters:
    filter_size: 3
    num_filter: 48
    stride: 2

Operation ID:
    4
Operation Type:
    convolution
Operations Parameters:
    filter_size: 3
    num_filter: 64
    stride: 1

Operation ID:
    5
Operation Type:
    convolution
Operations Parameters:
    filter_size: 3
    num_filter: 64
    stride: 2

Operation ID:
    6
Operation Type:
    convolution
Operations Parameters:
    filter_size: 3
    num_filter: 96
    stride: 1

Operation ID:
    7
Operation Type:
    convolution
Operations Parameters:
    filter_size: 3
    num_filter: 96
    stride: 2

Operation ID:
    8
Operation Type:
    convolution
Operations Parameters:
    filter_size: 3
    num_filter: 128
    stride: 1

Operation ID:
    9
Operation Type:
    convolution
Operations Parameters:
    filter_size: 3
    num_filter: 128
    stride: 2

Operation ID:
    10
Operation Type:
    convolution
Operations Parameters:
    filter_size: 5
    num_filter: 32
    stride: 1

Operation ID:
    11
Operation Type:
    convolution
Operations Parameters:
    filter_size: 5
    num_filter: 32
    stride: 2

Operation ID:
    12
Operation Type:
    convolution
Operations Parameters:
    filter_size: 5
    num_filter: 48
    stride: 1

Operation ID:
    13
Operation Type:
    convolution
Operations Parameters:
    filter_size: 5
    num_filter: 48
    stride: 2

Operation ID:
    14
Operation Type:
    convolution
Operations Parameters:
    filter_size: 5
    num_filter: 64
    stride: 1

Operation ID:
    15
Operation Type:
    convolution
Operations Parameters:
    filter_size: 5
    num_filter: 64
    stride: 2

Operation ID:
    16
Operation Type:
    convolution
Operations Parameters:
    filter_size: 5
    num_filter: 96
    stride: 1

Operation ID:
    17
Operation Type:
    convolution
Operations Parameters:
    filter_size: 5
    num_filter: 96
    stride: 2

Operation ID:
    18
Operation Type:
    convolution
Operations Parameters:
    filter_size: 5
    num_filter: 128
    stride: 1

Operation ID:
    19
Operation Type:
    convolution
Operations Parameters:
    filter_size: 5
    num_filter: 128
    stride: 2

Operation ID:
    20
Operation Type:
    convolution
Operations Parameters:
    filter_size: 7
    num_filter: 32
    stride: 1

Operation ID:
    21
Operation Type:
    convolution
Operations Parameters:
    filter_size: 7
    num_filter: 32
    stride: 2

Operation ID:
    22
Operation Type:
    convolution
Operations Parameters:
    filter_size: 7
    num_filter: 48
    stride: 1

Operation ID:
    23
Operation Type:
    convolution
Operations Parameters:
    filter_size: 7
    num_filter: 48
    stride: 2

Operation ID:
    24
Operation Type:
    convolution
Operations Parameters:
    filter_size: 7
    num_filter: 64
    stride: 1

Operation ID:
    25
Operation Type:
    convolution
Operations Parameters:
    filter_size: 7
    num_filter: 64
    stride: 2

Operation ID:
    26
Operation Type:
    convolution
Operations Parameters:
    filter_size: 7
    num_filter: 96
    stride: 1

Operation ID:
    27
Operation Type:
    convolution
Operations Parameters:
    filter_size: 7
    num_filter: 96
    stride: 2

Operation ID:
    28
Operation Type:
    convolution
Operations Parameters:
    filter_size: 7
    num_filter: 128
    stride: 1

Operation ID:
    29
Operation Type:
    convolution
Operations Parameters:
    filter_size: 7
    num_filter: 128
    stride: 2

Operation ID:
    30
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 3
    num_filter: 32
    stride: 1
    depth_multiplier: 1

Operation ID:
    31
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 3
    num_filter: 32
    stride: 1
    depth_multiplier: 2

Operation ID:
    32
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 3
    num_filter: 32
    stride: 2
    depth_multiplier: 1

Operation ID:
    33
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 3
    num_filter: 32
    stride: 2
    depth_multiplier: 2

Operation ID:
    34
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 3
    num_filter: 48
    stride: 1
    depth_multiplier: 1

Operation ID:
    35
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 3
    num_filter: 48
    stride: 1
    depth_multiplier: 2

Operation ID:
    36
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 3
    num_filter: 48
    stride: 2
    depth_multiplier: 1

Operation ID:
    37
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 3
    num_filter: 48
    stride: 2
    depth_multiplier: 2

Operation ID:
    38
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 3
    num_filter: 64
    stride: 1
    depth_multiplier: 1

Operation ID:
    39
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 3
    num_filter: 64
    stride: 1
    depth_multiplier: 2

Operation ID:
    40
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 3
    num_filter: 64
    stride: 2
    depth_multiplier: 1

Operation ID:
    41
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 3
    num_filter: 64
    stride: 2
    depth_multiplier: 2

Operation ID:
    42
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 3
    num_filter: 96
    stride: 1
    depth_multiplier: 1

Operation ID:
    43
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 3
    num_filter: 96
    stride: 1
    depth_multiplier: 2

Operation ID:
    44
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 3
    num_filter: 96
    stride: 2
    depth_multiplier: 1

Operation ID:
    45
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 3
    num_filter: 96
    stride: 2
    depth_multiplier: 2

Operation ID:
    46
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 3
    num_filter: 128
    stride: 1
    depth_multiplier: 1

Operation ID:
    47
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 3
    num_filter: 128
    stride: 1
    depth_multiplier: 2

Operation ID:
    48
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 3
    num_filter: 128
    stride: 2
    depth_multiplier: 1

Operation ID:
    49
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 3
    num_filter: 128
    stride: 2
    depth_multiplier: 2

Operation ID:
    50
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 5
    num_filter: 32
    stride: 1
    depth_multiplier: 1

Operation ID:
    51
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 5
    num_filter: 32
    stride: 1
    depth_multiplier: 2

Operation ID:
    52
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 5
    num_filter: 32
    stride: 2
    depth_multiplier: 1

Operation ID:
    53
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 5
    num_filter: 32
    stride: 2
    depth_multiplier: 2

Operation ID:
    54
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 5
    num_filter: 48
    stride: 1
    depth_multiplier: 1

Operation ID:
    55
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 5
    num_filter: 48
    stride: 1
    depth_multiplier: 2

Operation ID:
    56
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 5
    num_filter: 48
    stride: 2
    depth_multiplier: 1

Operation ID:
    57
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 5
    num_filter: 48
    stride: 2
    depth_multiplier: 2

Operation ID:
    58
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 5
    num_filter: 64
    stride: 1
    depth_multiplier: 1

Operation ID:
    59
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 5
    num_filter: 64
    stride: 1
    depth_multiplier: 2

Operation ID:
    60
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 5
    num_filter: 64
    stride: 2
    depth_multiplier: 1

Operation ID:
    61
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 5
    num_filter: 64
    stride: 2
    depth_multiplier: 2

Operation ID:
    62
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 5
    num_filter: 96
    stride: 1
    depth_multiplier: 1

Operation ID:
    63
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 5
    num_filter: 96
    stride: 1
    depth_multiplier: 2

Operation ID:
    64
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 5
    num_filter: 96
    stride: 2
    depth_multiplier: 1

Operation ID:
    65
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 5
    num_filter: 96
    stride: 2
    depth_multiplier: 2

Operation ID:
    66
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 5
    num_filter: 128
    stride: 1
    depth_multiplier: 1

Operation ID:
    67
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 5
    num_filter: 128
    stride: 1
    depth_multiplier: 2

Operation ID:
    68
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 5
    num_filter: 128
    stride: 2
    depth_multiplier: 1

Operation ID:
    69
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 5
    num_filter: 128
    stride: 2
    depth_multiplier: 2

Operation ID:
    70
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 7
    num_filter: 32
    stride: 1
    depth_multiplier: 1

Operation ID:
    71
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 7
    num_filter: 32
    stride: 1
    depth_multiplier: 2

Operation ID:
    72
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 7
    num_filter: 32
    stride: 2
    depth_multiplier: 1

Operation ID:
    73
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 7
    num_filter: 32
    stride: 2
    depth_multiplier: 2

Operation ID:
    74
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 7
    num_filter: 48
    stride: 1
    depth_multiplier: 1

Operation ID:
    75
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 7
    num_filter: 48
    stride: 1
    depth_multiplier: 2

Operation ID:
    76
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 7
    num_filter: 48
    stride: 2
    depth_multiplier: 1

Operation ID:
    77
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 7
    num_filter: 48
    stride: 2
    depth_multiplier: 2

Operation ID:
    78
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 7
    num_filter: 64
    stride: 1
    depth_multiplier: 1

Operation ID:
    79
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 7
    num_filter: 64
    stride: 1
    depth_multiplier: 2

Operation ID:
    80
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 7
    num_filter: 64
    stride: 2
    depth_multiplier: 1

Operation ID:
    81
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 7
    num_filter: 64
    stride: 2
    depth_multiplier: 2

Operation ID:
    82
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 7
    num_filter: 96
    stride: 1
    depth_multiplier: 1

Operation ID:
    83
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 7
    num_filter: 96
    stride: 1
    depth_multiplier: 2

Operation ID:
    84
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 7
    num_filter: 96
    stride: 2
    depth_multiplier: 1

Operation ID:
    85
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 7
    num_filter: 96
    stride: 2
    depth_multiplier: 2

Operation ID:
    86
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 7
    num_filter: 128
    stride: 1
    depth_multiplier: 1

Operation ID:
    87
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 7
    num_filter: 128
    stride: 1
    depth_multiplier: 2

Operation ID:
    88
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 7
    num_filter: 128
    stride: 2
    depth_multiplier: 1

Operation ID:
    89
Operation Type:
    separable_convolution
Operations Parameters:
    filter_size: 7
    num_filter: 128
    stride: 2
    depth_multiplier: 2

Operation ID:
    90
Operation Type:
    depthwise_convolution
Operations Parameters:
    filter_size: 3
    stride: 1
    depth_multiplier: 1

Operation ID:
    91
Operation Type:
    depthwise_convolution
Operations Parameters:
    filter_size: 3
    stride: 1
    depth_multiplier: 2

Operation ID:
    92
Operation Type:
    depthwise_convolution
Operations Parameters:
    filter_size: 3
    stride: 2
    depth_multiplier: 1

Operation ID:
    93
Operation Type:
    depthwise_convolution
Operations Parameters:
    filter_size: 3
    stride: 2
    depth_multiplier: 2

Operation ID:
    94
Operation Type:
    depthwise_convolution
Operations Parameters:
    filter_size: 5
    stride: 1
    depth_multiplier: 1

Operation ID:
    95
Operation Type:
    depthwise_convolution
Operations Parameters:
    filter_size: 5
    stride: 1
    depth_multiplier: 2

Operation ID:
    96
Operation Type:
    depthwise_convolution
Operations Parameters:
    filter_size: 5
    stride: 2
    depth_multiplier: 1

Operation ID:
    97
Operation Type:
    depthwise_convolution
Operations Parameters:
    filter_size: 5
    stride: 2
    depth_multiplier: 2

Operation ID:
    98
Operation Type:
    depthwise_convolution
Operations Parameters:
    filter_size: 7
    stride: 1
    depth_multiplier: 1

Operation ID:
    99
Operation Type:
    depthwise_convolution
Operations Parameters:
    filter_size: 7
    stride: 1
    depth_multiplier: 2

Operation ID:
    100
Operation Type:
    depthwise_convolution
Operations Parameters:
    filter_size: 7
    stride: 2
    depth_multiplier: 1

Operation ID:
    101
Operation Type:
    depthwise_convolution
Operations Parameters:
    filter_size: 7
    stride: 2
    depth_multiplier: 2

Operation ID:
    102
Operation Type:
    reduction
Operations Parameters:
    reduction_type: max_pooling
    pool_size: 2

Operation ID:
    103
Operation Type:
    reduction
Operations Parameters:
    reduction_type: max_pooling
    pool_size: 3

Operation ID:
    104
Operation Type:
    reduction
Operations Parameters:
    reduction_type: avg_pooling
    pool_size: 2

Operation ID:
    105
Operation Type:
    reduction
Operations Parameters:
    reduction_type: avg_pooling
    pool_size: 3

There are 106 operations in total.

>>> Parameters of LSTM Controller for Experiment nas-rl-example-gpu-2
lstm_num_cells:     64
lstm_num_layers:    1
lstm_keep_prob:     1.0
optimizer:      adam
init_learning_rate:     0.001
lr_decay_start:     0
lr_decay_every:     1000
lr_decay_rate:      0.9
skip-target:        0.4
skip-weight:        0.8
l2_reg:         0.0
entropy_weight:     0.0001
baseline_decay:     0.9999
RequestNumber:      3

>>> Building Controller
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:68: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

--- Logging error ---
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
    msg = self.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
    return fmt.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
    s = self.formatMessage(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
    return self._style.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
    return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
  File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
    work_item.run()
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
    argument, request_deserializer)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
    response_or_iterator = behavior(argument, context)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 222, in GetSuggestions
    experiment = NAS_RL_Experiment(request, self.logger)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 45, in __init__
    self._setup_controller()
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 103, in _setup_controller
    logger=self.logger)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 63, in __init__
    self._create_params()
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 68, in _create_params
    with tf.variable_scope(self.name, initializer=initializer):
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation_wrapper.py", line 119, in __getattr__
    _call_location(), full_name, rename)
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
    get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: The name %s is deprecated. Please use %s instead.\n'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:68', 'tf.variable_scope', 'tf.compat.v1.variable_scope')
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:73: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

--- Logging error ---
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
    msg = self.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
    return fmt.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
    s = self.formatMessage(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
    return self._style.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
    return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
  File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
    work_item.run()
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
    argument, request_deserializer)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
    response_or_iterator = behavior(argument, context)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 222, in GetSuggestions
    experiment = NAS_RL_Experiment(request, self.logger)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 45, in __init__
    self._setup_controller()
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 103, in _setup_controller
    logger=self.logger)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 63, in __init__
    self._create_params()
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 73, in _create_params
    w = tf.get_variable("w", [2 * self.lstm_size, 4 * self.lstm_size])
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation_wrapper.py", line 119, in __getattr__
    _call_location(), full_name, rename)
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
    get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: The name %s is deprecated. Please use %s instead.\n'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:73', 'tf.get_variable', 'tf.compat.v1.get_variable')
>>> Building Controller Sampler
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:113: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.random.categorical` instead.
--- Logging error ---
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
    msg = self.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
    return fmt.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
    s = self.formatMessage(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
    return self._style.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
    return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
  File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
    work_item.run()
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
    argument, request_deserializer)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
    response_or_iterator = behavior(argument, context)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 222, in GetSuggestions
    experiment = NAS_RL_Experiment(request, self.logger)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 45, in __init__
    self._setup_controller()
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 103, in _setup_controller
    logger=self.logger)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 64, in __init__
    self._build_sampler()
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 113, in _build_sampler
    operation_id = tf.multinomial(logit, 1)
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 323, in new_func
    instructions)
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
    get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: %s (from %s) is deprecated and will be removed %s.\nInstructions for updating:\n%s'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:113', 'multinomial', 'tensorflow.python.ops.random_ops', 'in a future version', 'Use `tf.random.categorical` instead.')
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:114: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
--- Logging error ---
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
    msg = self.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
    return fmt.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
    s = self.formatMessage(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
    return self._style.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
    return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
  File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
    work_item.run()
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
    argument, request_deserializer)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
    response_or_iterator = behavior(argument, context)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 222, in GetSuggestions
    experiment = NAS_RL_Experiment(request, self.logger)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 45, in __init__
    self._setup_controller()
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 103, in _setup_controller
    logger=self.logger)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 64, in __init__
    self._build_sampler()
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 114, in _build_sampler
    operation_id = tf.to_int32(operation_id)
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 323, in new_func
    instructions)
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
    get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: %s (from %s) is deprecated and will be removed %s.\nInstructions for updating:\n%s'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:114', 'to_int32', 'tensorflow.python.ops.math_ops', 'in a future version', 'Use `tf.cast` instead.')
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:144: The name tf.log is deprecated. Please use tf.math.log instead.

--- Logging error ---
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
    msg = self.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
    return fmt.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
    s = self.formatMessage(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
    return self._style.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
    return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
  File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
    work_item.run()
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
    argument, request_deserializer)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
    response_or_iterator = behavior(argument, context)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 222, in GetSuggestions
    experiment = NAS_RL_Experiment(request, self.logger)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 45, in __init__
    self._setup_controller()
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 103, in _setup_controller
    logger=self.logger)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 64, in __init__
    self._build_sampler()
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 144, in _build_sampler
    kl = skip_prob * tf.log(skip_prob / skip_targets)
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation_wrapper.py", line 119, in __getattr__
    _call_location(), full_name, rename)
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
    get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: The name %s is deprecated. Please use %s instead.\n'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:144', 'tf.log', 'tf.math.log')
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:156: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
--- Logging error ---
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
    msg = self.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
    return fmt.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
    s = self.formatMessage(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
    return self._style.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
    return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
  File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
    work_item.run()
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
    argument, request_deserializer)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
    response_or_iterator = behavior(argument, context)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 222, in GetSuggestions
    experiment = NAS_RL_Experiment(request, self.logger)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 45, in __init__
    self._setup_controller()
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 103, in _setup_controller
    logger=self.logger)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 64, in __init__
    self._build_sampler()
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 156, in _build_sampler
    skip = tf.to_float(skip)
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 323, in new_func
    instructions)
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
    get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: %s (from %s) is deprecated and will be removed %s.\nInstructions for updating:\n%s'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:156', 'to_float', 'tensorflow.python.ops.math_ops', 'in a future version', 'Use `tf.cast` instead.')
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:183: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

--- Logging error ---
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
    msg = self.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
    return fmt.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
    s = self.formatMessage(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
    return self._style.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
    return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
  File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
    work_item.run()
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
    argument, request_deserializer)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
    response_or_iterator = behavior(argument, context)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 222, in GetSuggestions
    experiment = NAS_RL_Experiment(request, self.logger)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 45, in __init__
    self._setup_controller()
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 105, in _setup_controller
    self.controller.build_trainer()
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 183, in build_trainer
    self.reward = tf.placeholder(tf.float32, shape=())
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation_wrapper.py", line 119, in __getattr__
    _call_location(), full_name, rename)
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
    get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: The name %s is deprecated. Please use %s instead.\n'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:183', 'tf.placeholder', 'tf.compat.v1.placeholder')
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:193: The name tf.assign_sub is deprecated. Please use tf.compat.v1.assign_sub instead.

--- Logging error ---
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
    msg = self.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
    return fmt.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
    s = self.formatMessage(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
    return self._style.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
    return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
  File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
    work_item.run()
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
    argument, request_deserializer)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
    response_or_iterator = behavior(argument, context)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 222, in GetSuggestions
    experiment = NAS_RL_Experiment(request, self.logger)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 45, in __init__
    self._setup_controller()
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 105, in _setup_controller
    self.controller.build_trainer()
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 193, in build_trainer
    baseline_update = tf.assign_sub(self.baseline, (1 - self.bl_dec) * (self.baseline - self.reward))
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation_wrapper.py", line 119, in __getattr__
    _call_location(), full_name, rename)
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
    get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: The name %s is deprecated. Please use %s instead.\n'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py:193', 'tf.assign_sub', 'tf.compat.v1.assign_sub')
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Trainer.py:102: The name tf.train.exponential_decay is deprecated. Please use tf.compat.v1.train.exponential_decay instead.

--- Logging error ---
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
    msg = self.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
    return fmt.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
    s = self.formatMessage(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
    return self._style.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
    return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
  File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
    work_item.run()
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
    argument, request_deserializer)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
    response_or_iterator = behavior(argument, context)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 222, in GetSuggestions
    experiment = NAS_RL_Experiment(request, self.logger)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 45, in __init__
    self._setup_controller()
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 105, in _setup_controller
    self.controller.build_trainer()
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 219, in build_trainer
    num_replicas=self.num_replicas)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Trainer.py", line 102, in get_train_ops
    learning_rate = tf.train.exponential_decay(
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation_wrapper.py", line 119, in __getattr__
    _call_location(), full_name, rename)
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
    get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: The name %s is deprecated. Please use %s instead.\n'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Trainer.py:102', 'tf.train.exponential_decay', 'tf.compat.v1.train.exponential_decay')
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Trainer.py:118: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

--- Logging error ---
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
    msg = self.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
    return fmt.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
    s = self.formatMessage(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
    return self._style.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
    return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
  File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
    work_item.run()
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
    argument, request_deserializer)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
    response_or_iterator = behavior(argument, context)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 222, in GetSuggestions
    experiment = NAS_RL_Experiment(request, self.logger)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 45, in __init__
    self._setup_controller()
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 105, in _setup_controller
    self.controller.build_trainer()
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Controller.py", line 219, in build_trainer
    num_replicas=self.num_replicas)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Trainer.py", line 118, in get_train_ops
    opt = tf.train.AdamOptimizer(learning_rate, beta1=0.0, epsilon=1e-3,
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation_wrapper.py", line 119, in __getattr__
    _call_location(), full_name, rename)
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
    get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: The name %s is deprecated. Please use %s instead.\n'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/NAS_Reinforcement_Learning/Trainer.py:118', 'tf.train.AdamOptimizer', 'tf.compat.v1.train.AdamOptimizer')
>>> Suggestion for Experiment nas-rl-example-gpu-2 has been initialized.

----------------------------------------------------------------------------------------------------
Suggestion Step 0 for Experiment nas-rl-example-gpu-2
----------------------------------------------------------------------------------------------------
WARNING:tensorflow:From /usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py:227: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

--- Logging error ---
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
    msg = self.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
    return fmt.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 611, in format
    s = self.formatMessage(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 580, in formatMessage
    return self._style.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 422, in format
    return self._fmt % record.__dict__
KeyError: 'experiment_name'
Call stack:
  File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 80, in _worker
    work_item.run()
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 551, in _unary_response_in_pool
    argument, request_deserializer)
  File "/usr/local/lib/python3.7/site-packages/grpc/_server.py", line 434, in _call_behavior
    response_or_iterator = behavior(argument, context)
  File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py", line 227, in GetSuggestions
    saver = tf.train.Saver()
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/util/deprecation_wrapper.py", line 119, in __getattr__
    _call_location(), full_name, rename)
  File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/tf_logging.py", line 166, in warning
    get_logger().warning(msg, *args, **kwargs)
Message: 'From %s: The name %s is deprecated. Please use %s instead.\n'
Arguments: ('/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/nasrl_service.py:227', 'tf.train.Saver', 'tf.compat.v1.train.Saver')
>>> First time running suggestion for nas-rl-example-gpu-2. Random architecture will be given.
2020-02-04 00:21:56.458811: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-02-04 00:21:56.463174: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2999995000 Hz
2020-02-04 00:21:56.463653: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x563b4d4c3d20 executing computations on platform Host. Devices:
2020-02-04 00:21:56.463681: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2020-02-04 00:21:56.538853: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.

>>> New Neural Network Architecture Candidate #0 (internal representation):
[[78], [43, 0], [81, 0, 1], [2, 0, 1, 0], [67, 1, 0, 0, 0], [63, 0, 0, 1, 1, 0], [99, 1, 0, 1, 0, 0, 0], [66, 1, 1, 0, 0, 1, 0, 1]]

>>> Corresponding Seach Space Description:
{'num_layers': 8, 'input_sizes': [32, 32, 3], 'output_sizes': [10], 'embedding': {'78': {'opt_id': 78, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '7', 'num_filter': '64', 'stride': '1', 'depth_multiplier': '1'}}, '43': {'opt_id': 43, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '3', 'num_filter': '96', 'stride': '1', 'depth_multiplier': '2'}}, '81': {'opt_id': 81, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '7', 'num_filter': '64', 'stride': '2', 'depth_multiplier': '2'}}, '2': {'opt_id': 2, 'opt_type': 'convolution', 'opt_params': {'filter_size': '3', 'num_filter': '48', 'stride': '1'}}, '67': {'opt_id': 67, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '5', 'num_filter': '128', 'stride': '1', 'depth_multiplier': '2'}}, '63': {'opt_id': 63, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '5', 'num_filter': '96', 'stride': '1', 'depth_multiplier': '2'}}, '99': {'opt_id': 99, 'opt_type': 'depthwise_convolution', 'opt_params': {'filter_size': '7', 'stride': '1', 'depth_multiplier': '2'}}, '66': {'opt_id': 66, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '5', 'num_filter': '128', 'stride': '1', 'depth_multiplier': '1'}}}}

>>> New Neural Network Architecture Candidate #1 (internal representation):
[[77], [95, 0], [7, 1, 1], [67, 1, 0, 0], [17, 1, 1, 0, 0], [82, 1, 1, 1, 0, 0], [41, 0, 1, 0, 0, 0, 0], [39, 1, 1, 1, 1, 0, 1, 0]]

>>> Corresponding Seach Space Description:
{'num_layers': 8, 'input_sizes': [32, 32, 3], 'output_sizes': [10], 'embedding': {'77': {'opt_id': 77, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '7', 'num_filter': '48', 'stride': '2', 'depth_multiplier': '2'}}, '95': {'opt_id': 95, 'opt_type': 'depthwise_convolution', 'opt_params': {'filter_size': '5', 'stride': '1', 'depth_multiplier': '2'}}, '7': {'opt_id': 7, 'opt_type': 'convolution', 'opt_params': {'filter_size': '3', 'num_filter': '96', 'stride': '2'}}, '67': {'opt_id': 67, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '5', 'num_filter': '128', 'stride': '1', 'depth_multiplier': '2'}}, '17': {'opt_id': 17, 'opt_type': 'convolution', 'opt_params': {'filter_size': '5', 'num_filter': '96', 'stride': '2'}}, '82': {'opt_id': 82, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '7', 'num_filter': '96', 'stride': '1', 'depth_multiplier': '1'}}, '41': {'opt_id': 41, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '3', 'num_filter': '64', 'stride': '2', 'depth_multiplier': '2'}}, '39': {'opt_id': 39, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '3', 'num_filter': '64', 'stride': '1', 'depth_multiplier': '2'}}}}

>>> New Neural Network Architecture Candidate #2 (internal representation):
[[33], [19, 1], [33, 1, 0], [53, 0, 1, 0], [4, 1, 0, 0, 1], [96, 1, 0, 0, 0, 1], [63, 0, 0, 0, 1, 1, 1], [76, 1, 1, 1, 1, 0, 1, 0]]

>>> Corresponding Seach Space Description:
{'num_layers': 8, 'input_sizes': [32, 32, 3], 'output_sizes': [10], 'embedding': {'33': {'opt_id': 33, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '3', 'num_filter': '32', 'stride': '2', 'depth_multiplier': '2'}}, '19': {'opt_id': 19, 'opt_type': 'convolution', 'opt_params': {'filter_size': '5', 'num_filter': '128', 'stride': '2'}}, '53': {'opt_id': 53, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '5', 'num_filter': '32', 'stride': '2', 'depth_multiplier': '2'}}, '4': {'opt_id': 4, 'opt_type': 'convolution', 'opt_params': {'filter_size': '3', 'num_filter': '64', 'stride': '1'}}, '96': {'opt_id': 96, 'opt_type': 'depthwise_convolution', 'opt_params': {'filter_size': '5', 'stride': '2', 'depth_multiplier': '1'}}, '63': {'opt_id': 63, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '5', 'num_filter': '96', 'stride': '1', 'depth_multiplier': '2'}}, '76': {'opt_id': 76, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '7', 'num_filter': '48', 'stride': '2', 'depth_multiplier': '1'}}}}

>>> 3 Trials were created for Experiment nas-rl-example-gpu-2

Here's the description of the pod:

Name:           nas-rl-example-gpu-2-qfpqcqbp-jjg98
Namespace:      kubeflow
Priority:       0
Node:           ip-10-71-72-149.ec2.internal/10.71.72.149
Start Time:     Mon, 03 Feb 2020 16:30:22 -0800
Labels:         controller-uid=5a57b00f-46e4-11ea-a653-0a392894a425
                job-name=nas-rl-example-gpu-2-qfpqcqbp
Annotations:    <none>
Status:         Failed
IP:             10.42.93.12
IPs:            <none>
Controlled By:  Job/nas-rl-example-gpu-2-qfpqcqbp
Containers:
  nas-rl-example-gpu-2-qfpqcqbp:
    Container ID:  docker://602f8a69cfcf7add633c2cc2b8ffdf9049050898ca7d94a351ccee5224bd297d
    Image:         docker.io/kubeflowkatib/nasrl-cifar10-gpu
    Image ID:      docker-pullable://kubeflowkatib/nasrl-cifar10-gpu@sha256:b2039193018df3fd7650ad257531621704589c472748f64acfb617fab468a212
    Port:          <none>
    Host Port:     <none>
    Command:
      python3.5
      -u
      RunTrial.py
      --architecture="[[77, [95, 0, [7, 1, 1, [67, 1, 0, 0, [17, 1, 1, 0, 0, [82, 1, 1, 1, 0, 0, [41, 0, 1, 0, 0, 0, 0, [39, 1, 1, 1, 1, 0, 1, 0]"
      --nn_config="{'num_layers': 8, 'input_sizes': [32, 32, 3, 'output_sizes': [10, 'embedding': {'77': {'opt_id': 77, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '7', 'num_filter': '48', 'stride': '2', 'depth_multiplier': '2'}}, '95': {'opt_id': 95, 'opt_type': 'depthwise_convolution', 'opt_params': {'filter_size': '5', 'stride': '1', 'depth_multiplier': '2'}}, '7': {'opt_id': 7, 'opt_type': 'convolution', 'opt_params': {'filter_size': '3', 'num_filter': '96', 'stride': '2'}}, '67': {'opt_id': 67, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '5', 'num_filter': '128', 'stride': '1', 'depth_multiplier': '2'}}, '17': {'opt_id': 17, 'opt_type': 'convolution', 'opt_params': {'filter_size': '5', 'num_filter': '96', 'stride': '2'}}, '82': {'opt_id': 82, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '7', 'num_filter': '96', 'stride': '1', 'depth_multiplier': '1'}}, '41': {'opt_id': 41, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '3', 'num_filter': '64', 'stride': '2', 'depth_multiplier': '2'}}, '39': {'opt_id': 39, 'opt_type': 'separable_convolution', 'opt_params': {'filter_size': '3', 'num_filter': '64', 'stride': '1', 'depth_multiplier': '2'}}}}"
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 03 Feb 2020 16:30:24 -0800
      Finished:     Mon, 03 Feb 2020 16:30:26 -0800
    Ready:          False
    Restart Count:  0
    Limits:
      nvidia.com/gpu:  1
    Requests:
      nvidia.com/gpu:  1
    Environment:       <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-fpv72 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-fpv72:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-fpv72
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  group-name=gpu-worker-group-1
Tolerations:     dedicated=gpu-worker-group-1:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age    From                                   Message
  ----    ------     ----   ----                                   -------
  Normal  Scheduled  3m47s  default-scheduler                      Successfully assigned kubeflow/nas-rl-example-gpu-2-qfpqcqbp-jjg98 to ip-10-71-72-149.ec2.internal
  Normal  Pulling    3m46s  kubelet, ip-10-71-72-149.ec2.internal  Pulling image "docker.io/kubeflowkatib/nasrl-cifar10-gpu"
  Normal  Pulled     3m46s  kubelet, ip-10-71-72-149.ec2.internal  Successfully pulled image "docker.io/kubeflowkatib/nasrl-cifar10-gpu"
  Normal  Created    3m46s  kubelet, ip-10-71-72-149.ec2.internal  Created container nas-rl-example-gpu-2-qfpqcqbp
  Normal  Started    3m45s  kubelet, ip-10-71-72-149.ec2.internal  Started container nas-rl-example-gpu-2-qfpqcqbp
timothyjlaurent commented 4 years ago

Interestingly, I can get the Experiment to run if I add

  metricsCollectorSpec:
    collector:
      kind: StdOut

to the example yaml.

andreyvelich commented 4 years ago

I think you are using not the latest image for the NAS Suggestion. I can see some old warning from the logs. Can you check what image are you using in the katib-config configmap for the nasrl Suggestion?

timothyjlaurent commented 4 years ago

"image": "gcr.io/kubeflow-images-public/katib/v1alpha3/suggestion-nasrl:v0.7.0"

andreyvelich commented 4 years ago

"image": "gcr.io/kubeflow-images-public/katib/v1alpha3/suggestion-nasrl:v0.7.0"

Can you try to edit this image to the latests - gcr.io/kubeflow-images-public/katib/v1alpha3/suggestion-nasrl and run this experiment again?

timothyjlaurent commented 4 years ago

updating to the latest image works. I see that the metricsCollectorSpec gets added to the Experiment.

timothyjlaurent commented 4 years ago

perhaps the images link needs to be updated in the kustomize?

andreyvelich commented 4 years ago

perhaps the images link needs to be updated in the kustomize?

I need to fix some small bugs in Katib UI, after that we will update Kustomize images.

timothyjlaurent commented 4 years ago

Sounds good, I'll close this, thanks for the help @andreyvelich