keras-team / keras-hub

Pretrained model hub for Keras 3
Apache License 2.0
790 stars 242 forks source link

Accelerator test timeout #809

Closed chenmoneygithub closed 1 year ago

chenmoneygithub commented 1 year ago

Recently we are seeing a few timeout on accelerator testing, but checking the log, the tests are finished:

Step #5 - "create-job": keras_nlp/utils/tf_utils_test.py::TensorToStringListTest::test_session <- ../usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/test_util.py SKIPPED (Not a test.) [100%]
Step #5 - "create-job": 
Step #5 - "create-job": ================ 1035 passed, 214 skipped in 2780.92s (0:46:20) ================
Step #5 - "create-job": + sleep 5
Step #5 - "create-job": + gcloud artifacts docker images delete us-west1-docker.pkg.dev/keras-team-test/keras-nlp-test/keras-nlp-image:9480287a-708d-401e-8924-7ab6b420ddb4
Step #5 - "create-job": Digests:
Step #5 - "create-job": - us-west1-docker.pkg.dev/keras-team-test/keras-nlp-test/keras-nlp-image@sha256:8f69c0a0cb78aca13c62ab87ca0e252bedecf4d46bb2ce122c4080a5342dcba9
Step #5 - "create-job": 
Step #5 - "create-job": Tags:
Step #5 - "create-job": - us-west1-docker.pkg.dev/keras-team-test/keras-nlp-test/keras-nlp-image:9480287a-708d-401e-8924-7ab6b420ddb4
Step #5 - "create-job": 
Step #5 - "create-job": This operation will delete the above resources.
Step #5 - "create-job": 
Step #5 - "create-job": Do you want to continue (Y/n)?  
Step #5 - "create-job": Delete request issued.
Step #5 - "create-job": Waiting for operation [projects/keras-team-test/locations/us-west1/operations/6fcd169d-1748-4a89-9ac5-e27091ba41c5] to complete...
TIMEOUT
ERROR: context deadline exceeded

I will increase the deadline a bit to see if it helps, but let's keep this issue open for tracking.

chenmoneygithub commented 1 year ago

Seems to solve our issue for now, but let's think about splitting the workload into nightly test.