tensorflow / tpu

Reference models and tools for Cloud TPUs.
https://cloud.google.com/tpu/
Apache License 2.0
5.21k stars 1.77k forks source link

TPUConfig.num_shards is not set correctly. #451

Open marsggbo opened 5 years ago

marsggbo commented 5 years ago

I'm running the code of AmoebaNet using a TPUv2-32, but it can only find 8 TPU cores,and raises the following errors: image

What's the reason and hot to solve this problem?

saberkun commented 5 years ago

num_shards is no longer necessary in tf 1.14 and 1.13. It can be auto-derived.

marsggbo commented 5 years ago

Thanks for your quick reply @saberkun . I try again by removing the config of num_shards, it only use 8 cores instead of 32 cores.

Here is the TPU I created: image

I have another question, i.e. does the code of AmoebaNet contains the process of searching or only the process of training?

saberkun commented 5 years ago

Cloud TPU team are actively look into this.

For architecture search, the answer is no. In current repo, there is no RL-based search algorithm included.

marsggbo commented 5 years ago

@saberkun Is there any other open source code of NAS that can run on TPU? I want to compare the performance of TPU and GPU in terms of speed and accuracy, etc.

chrislarkin commented 5 years ago

@marsggbo You might be interested in http://github.com/dstamoulis/single-path-nas.

saberkun commented 5 years ago

@marsggbo Hi, do you mean other nets from architecture search? We have EfficientNet and MnasNet family inside this repo. I believe most imagenet classification problems can be easily implemented with these TPUEstimator framework. I could not recommend any search algorithm implementation as I did not look into them.