kubeflow / examples

A repository to host extended examples and tutorials
Apache License 2.0
1.41k stars 756 forks source link

[code search] Model export tf-job succeed when the actual work failed #326

Closed IronPan closed 5 years ago

IronPan commented 5 years ago

The tf-job seem to be succeeded

    Last Transition Time:  2018-11-10T20:19:03Z
    Last Update Time:      2018-11-10T20:19:03Z
    Message:               TFJob t2t-code-search-exporter is created.
    Reason:                TFJobCreated
    Status:                True
    Type:                  Created
    Last Transition Time:  2018-11-10T20:19:03Z
    Last Update Time:      2018-11-10T20:19:05Z
    Message:               TFJob t2t-code-search-exporter is running.
    Reason:                TFJobRunning
    Status:                False
    Type:                  Running
    Last Transition Time:  2018-11-10T20:19:03Z
    Last Update Time:      2018-11-10T20:21:11Z
    Message:               TFJob t2t-code-search-exporter is successfully completed.
    Reason:                TFJobSucceeded
    Status:                True
    Type:                  Succeeded
  Start Time:              2018-11-10T20:19:05Z

while the actual pod log indicates the job failed.

+ t2t-exporter --problem=kf_github_function_docstring --data_dir=gs://code-search-demo/20181104/data --output_dir=gs://code-search-demo/models/20181105-tinyparams --model=kf_similarity_transformer --hparams_set=transformer_tiny --master=grpc://t2t-code-search-exporter-worker-0:2222 --worker_id=0
Traceback (most recent call last):
  File "/usr/local/bin/t2t-exporter", line 7, in <module>
    from tensor2tensor.serving import export
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/serving/export.py", line 29, in <module>
    import tensorflow_hub as hub
ImportError: No module named tensorflow_hub
+ sleep 120
jlewi commented 5 years ago

The model export step should not be using a TFJob; because t2t-exporter is just some arbitrary binary.

This will be fixed by #320.