yuenshome / yuenshome.github.io

https://yuenshome.github.io
MIT License
84 stars 15 forks source link

TFRT Design Philosophy #127

Open ysh329 opened 3 years ago

ysh329 commented 3 years ago

TFRT Design Philosophy

This is a living document that aims to capture the overall design of the tfrt project, provides a place to capture the rationale for various decisions, and aims to be a good reference for new contributors as well as users of the libraries.


https://github.com/tensorflow/runtime/blob/master/documents/design_philosophy.md#guiding-philosophies

ysh329 commented 3 years ago

What is TFRT? This is a low level runtime for TensorFlow, with many goals:

Provide extremely high performance, efficient memory use, and focus on low-level efficiency. Efficiently use modern CPU hardware, including multicore, NUMA, etc. Strong support for heterogenous hardware (e.g. mobile phones). Make it easy to plug in support for new accelerators into TensorFlow. Provide extremely modular libraries, enabling products to be built out of them that use different slices of functionality (e.g. for mobile, or an inference server). Be easily extensible, easily hackable, easy to develop, easy to understand and work with. Follow modern open source development methodology. Design for scale - we expect this to grow very large over time. Enable progressive migration of functionality from the existing TensorFlow stack into a new design.


什么是TFRT?

这是TensorFlow的低级运行时,有许多目标:

提供极高的性能、高效的内存使用,并专注于低水平的效率。

高效使用现代CPU硬件,包括多核、NUMA等。

对异构硬件(如移动电话)的强大支持。

将新加速器的支持轻松插入TensorFlow。

提供非常模块化的库,使产品能够在这些库中使用不同的功能片段(例如,用于移动或推理服务器)。

易于扩展,易于破解,易于开发,易于理解和使用。

遵循现代开源开发方法。

规模化设计-我们期望随着时间的推移,这个规模会越来越大。

使功能从现有的TensorFlow堆栈逐步迁移到新设计中。

ysh329 commented 3 years ago

What do we mean by "low level" runtime? This library contains a significant amount of compute and accelerator functionality, but does not include Python bindings or other higher level APIs. Those will remain in the core TensorFlow repository.

At the time of this writing, this runtime is quite small, but we expect it to grow over time, including support for new hardware, new kernels, new APIs, and new applications.

This runtime is co-designed with the MLIR compiler infrastructure project. Several of its design points enable specific aspects of this design, and thus the overall TFRT pipeline benefits from and depends heavily on MLIR and its infrastructure. The design philosophies of TFRT are very similar to that followed by the LLVM and MLIR communities.


我们所说的“低级别”运行时是什么意思?这个库包含大量的计算和加速器功能,但不包括Python绑定或其他更高级别的api。这些将保留在核心TensorFlow存储库中。

在撰写本文时,这个运行时非常小,但我们预计它会随着时间的推移而增长,包括对新硬件、新内核、新api和新应用程序的支持。

这个运行时是与MLIR编译器基础设施项目共同设计的。它的几个设计点支持这种设计的特定方面,因此整个TFRT管道从MLIR及其基础设施中获益并在很大程度上依赖于MLIR。TFRT的设计理念与LLVM和MLIR社区的设计理念非常相似。

ysh329 commented 3 years ago

Guiding Philosophies

Here we describe a few of the high-level design principles, and unpack what that means for how we build things and how things are structured. These principles are generally independent of any individual library or tool in the project, but we use a few examples to illustrate the points.

Build the right thing

Build the right thing is more important than building something fast. We value iteration, evaluation, and experimentation, but eschew writing large amounts of throw-away code that "we'll come back to later". It is better to build things deliberately "bottom up", so we can make sure the lower level abstractions are right, and the higher level pieces can build on top of well considered underlying layers.

We aim to have subsystems that work together and have clearly defined roles and responsibilities. We do not want to see lots of internally defensive code, or code that works around deficiencies in other subsystems. This has implications: we need to spend a significant amount of time debating and iterating on design approaches, and document the results.


指导思想

在这里,我们将描述一些高级设计原则,并解释这对我们如何构建事物和如何构建事物意味着什么。这些原则通常独立于项目中的任何单个库或工具,但是我们使用一些示例来说明这些要点。

做正确的事情

建造正确的东西比快速建造更重要。我们重视迭代、评估和实验,但避免编写大量“我们稍后再讨论”的废弃代码。最好是有意地“自下而上”地构建内容,这样我们就可以确保较低级别的抽象是正确的,而较高级别的部分可以构建在经过充分考虑的底层之上。

我们的目标是让子系统协同工作,并明确定义角色和职责。我们不希望看到大量的内部防御代码,或者围绕其他子系统中的缺陷工作的代码。这意味着:我们需要花费大量的时间讨论和迭代设计方法,并记录结果。