snowkylin / tensorflow-handbook

简单粗暴 TensorFlow 2 | A Concise Handbook of TensorFlow 2 | 一本简明的 TensorFlow 2 入门指导教程
https://tf.wiki
3.94k stars 844 forks source link

TPU 分布式计算 #8

Open huan opened 5 years ago

huan commented 5 years ago

TPU 章节计划包括以下几部分内容:

目前看来,第一版可能来不及涵盖,所以计划在第一版中不包括 TPU 部分内容。(如果之后书出版之前还有时间补充,可以补充最基本的 Google Cloud TPU 配置方法)

大家看这样是否可以? @snowkylin @dpinthinker


  1. UPDATE(29 Aug 2019): TensorFlow 2.0/2.1 TPU Support Track Issue: https://github.com/tensorflow/tensorflow/issues/24412#issuecomment-525960626
  2. UPDATE(17 Mar 2019): 经过和锡涵讨论,TF2.0正式发布之前还能有一些时间,所以决定继续补充一个最基本的版本,5-10页
huan commented 4 years ago

Will start writting this chapter this week.

huan commented 4 years ago

Reviews from @snowkylin

TPU

JimXiongGM commented 4 years ago

您好, 章节《使用 TPU 训练 TensorFlow 模型(Huan)》的示例colab文件(https://colab.research.google.com/github/huan/tensorflow-handbook-tpu/blob/master/tensorflow-handbook-tpu-example.ipynb)无法跑通,显示

InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:worker/replica:0/task:0/device:CPU:0 in order to run AutoShardDataset: Unable to parse tensor proto
Additional GRPC error information:
{"created":"@1571137943.518656507","description":"Error received from peer","file":"external/grpc/src/core/lib/surface/call.cc","file_line":1039,"grpc_message":"Unable to parse tensor proto","grpc_status":3} [Op:AutoShardDataset]

请求解答,谢谢

huan commented 4 years ago

@JimXiongGM Hi, thanks for trying the TF2.0 with Colab & TPU!

The TensorFlow 2.0 has not finished TPU support in Colab. I get some updates from Googler and they said that it will be fully supported in TensorFlow 2.1.

This is a known issue and you can learn more from https://github.com/tensorflow/tensorflow/issues/33045#issuecomment-539148033 and https://github.com/huan/tensorflow-handbook-tpu/issues/1

The Workaround

Before the TF2.1 was released, you can use the latest TF1.x code and use eager execution, which all the API is quite like the TF2.0.

And you can switch to TF2.1 after the 2.1 is released, with very few code modifications.

P.S. I will update the chapter to describe this problem in detail today.

JimXiongGM commented 4 years ago

thanks a lot ;-)

huan commented 4 years ago

You are welcome. :)