intel-analytics / analytics-zoo

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
https://analytics-zoo.readthedocs.io/
Apache License 2.0
11 stars 3 forks source link

jenkins: ray.tune.error.TuneError #52

Closed pinggao187 closed 2 years ago

pinggao187 commented 2 years ago
Traceback (most recent call last):
  File "/opt/work/jenkins/workspace/ZOO-NB-Colab-Pytorch/docs/docs/colab-notebook/orca/quickstart/autoestimator_pytorch_lenet_mnist.py", line 265, in <module>
    metric="accuracy")
  File "/opt/work/conda/envs/py37/lib/python3.7/site-packages/zoo/orca/automl/auto_estimator.py", line 195, in fit
    self.searcher.run()
  File "/opt/work/conda/envs/py37/lib/python3.7/site-packages/zoo/orca/automl/search/ray_tune/ray_tune_search_engine.py", line 183, in run
    reuse_actors=True
  File "/opt/work/conda/envs/py37/lib/python3.7/site-packages/ray/tune/tune.py", line 444, in run
    raise TuneError("Trials did not complete", incomplete_trials)
ray.tune.error.TuneError: ('Trials did not complete', [train_func_373ad_00000])
Stopping orca context

jenkins link: http://10.112.231.51:18888/view/ZOO-NB-Overview/job/ZOO-NB-Colab-Pytorch/265/

@TheaperDeng This issue has appeared many times recently, is there any progress?

shanyu-sys commented 2 years ago

It is due to Jenkins unstable network, and cannot download mnist, related to https://github.com/intel-analytics/arda-docker/issues/409

shanyu-sys commented 2 years ago

I will change the script to read data from local without downloading every time.

pinggao187 commented 2 years ago

I will change the script to read data from local without downloading every time.

mnist sometimes fails to download in other files,can you change the download address of mnist to local?

http://10.112.231.51:18888/job/ZOO-NB-Pip-AppTests-MAC-py36/540/console

Downloading data from http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Traceback (most recent call last):
  File "/var/jenkins_home/workspace/ZOO-NB-Pip-AppTests-MAC-py36/dist/apps/variational-autoencoder/tmp_test.py", line 128, in <module>
    train_data = get_mnist(sc, mnist_path)
  File "/var/jenkins_home/workspace/ZOO-NB-Pip-AppTests-MAC-py36/dist/apps/variational-autoencoder/tmp_test.py", line 112, in get_mnist
    (train_images, train_labels) = mnist.read_data_sets(mnist_path, "train")
  File "/Users/arda/anaconda3/envs/py36pip/lib/python3.6/site-packages/bigdl/dataset/mnist.py", line 101, in read_data_sets
    SOURCE_URL + TRAIN_IMAGES)
  File "/Users/arda/anaconda3/envs/py36pip/lib/python3.6/site-packages/bigdl/dataset/base.py", line 194, in maybe_download
    urlretrieve(source_url, temp_file_name, dl_progress)
  File "/Users/arda/anaconda3/envs/py36pip/lib/python3.6/urllib/request.py", line 248, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/Users/arda/anaconda3/envs/py36pip/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/Users/arda/anaconda3/envs/py36pip/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/Users/arda/anaconda3/envs/py36pip/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/Users/arda/anaconda3/envs/py36pip/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/Users/arda/anaconda3/envs/py36pip/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/Users/arda/anaconda3/envs/py36pip/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 503: Service Unavailable
pinggao187 commented 2 years ago

Find this source_url, other mnist are downloaded from ftp, for example " wget -nv $FTP_URI/analytics-zoo-data/mnist/train-images-idx3-ubyte.gz ".

SOURCE_URL = 'http://yann.lecun.com/exdb/mnist/'

https://github.com/intel-analytics/analytics-zoo/blob/master/pyzoo/zoo/pipeline/api/keras/datasets/mnist.py#L23