modularml / mojo

The Mojo Programming Language
https://docs.modular.com/mojo
Other
21.59k stars 2.52k forks source link

[Documentation] Improvements for "Why Mojo?" page. #44

Closed aviolaris closed 1 year ago

aviolaris commented 1 year ago

URL to the documentation page:

https://docs.modular.com/mojo/why-mojo.html

Proposed modifications:

I have made some modifications, including a few that have already been reported by Elliot Waite in a previous issue, that you may want to incorporate. As there is no versioning yet, I have converted the comparison results into markdown format for easier tracking.

Original Text Modified Text
When we started Modular, we had no intentions of building a new programming language. But as we were building our platform with the intent to unify the world’s ML/AI infrastructure, we realized that programming across the entire stack was too complicated. Plus, we were writing a lot of MLIR by hand and not having a good time. When we started Modular, we had no intention of building a new programming language. But as we were building our platform with the intent to unify the world's ML/AI infrastructure, we realized that programming across the entire stack was too complicated. Additionally, we were writing a lot of MLIR by hand and not having a good time.
What we wanted was an innovative and scalable programming model that could target accelerators and other heterogeneous systems that are pervasive in machine learning. This meant a programming language with powerful compile-time metaprogramming, integration of adaptive compilation techniques, caching throughout the compilation flow, and other things that are not supported by existing languages. What we wanted was an innovative and scalable programming model that could target accelerators and other heterogeneous systems that are pervasive in machine learning. This meant a programming language with powerful compile-time metaprogramming, integration of adaptive compilation techniques, caching throughout the compilation flow, and other features that are not supported by existing languages.
And although accelerators are important, one of the most prevalent and sometimes overlooked “accelerators” is the host CPU. Today, CPUs have lots of tensor-core-like accelerator blocks and other AI acceleration units, but they also serve as the “fall back” for operations that specialized accelerators don’t handle, such as data loading, pre- and post-processing, and integrations with foreign systems. So it was clear that we couldn’t lift AI with an “accelerator language” that worked with only specific processors. And although accelerators are important, one of the most prevalent and sometimes overlooked "accelerators" is the host CPU. Nowadays, CPUs have numerous tensor-core-like accelerator blocks and other AI acceleration units, but they also serve as the "fallback" for operations that specialized accelerators don't handle, such as data loading, pre-processing and post-processing, and integrations with foreign systems. Therefore, it was clear that we couldn't lift AI with an "accelerator language" that only worked with specific processors.
Applied AI systems need to address all these issues and we decided there was no reason it couldn’t be done with just one language. So Mojo was born. Applied AI systems need to address all these issues, and we decided there was no reason it couldn't be done with just one language. Hence, Mojo was born.
We decided that our mission for Mojo would include innovations in compiler internals and support for current and emerging accelerators, but we didn’t see any need to innovate in language syntax or community. So we chose to embrace the Python ecosystem because it is so widely used, it is loved by the AI ecosystem, and because it is really nice! We decided that our mission for Mojo would include innovations in compiler internals and support for current and emerging accelerators, but we saw no need to innovate in language syntax or community. Thus, we chose to embrace the Python ecosystem because it is widely used, it is loved by the AI community and it is really nice!
Mojo as a member of the Python family Mojo as a member of the Python family
The Mojo language has lofty goals - we want full compatibility with the Python ecosystem, we would like predictable low-level performance and low-level control, and we need the ability to deploy subsets of code to accelerators. We also don’t want ecosystem fragmentation - we hope that people find our work to be useful over time, and don’t want something like the Python 2 => Python 3 migration to happen again. These are no small goals! The Mojo language has lofty goals. We want full compatibility with the Python ecosystem, predictable low-level performance and low-level control. We need the ability to deploy subsets of code to accelerators, and we don’t want ecosystem fragmentation. We hope that people find our work useful over time and don’t want something like the Python 2 => Python 3 migration to happen again. These are not small goals!
Fortunately, while Mojo is a brand new code base, we aren’t really starting from scratch conceptually. Embracing Python massively simplifies our design efforts, because most of the syntax is already specified. We can instead focus our efforts on building the compilation model and designing specific systems programming features. We also benefit from tremendous work on other languages (e.g. Clang, Rust, Swift, Julia, Zig, Nim, etc), and leverage the MLIR compiler ecosystem. We also benefit from experience with the Swift programming language, which migrated most of a massive Objective-C community over to a new language. Fortunately, while Mojo is a brand new code base, we aren’t really starting from scratch conceptually. Embracing Python massively simplifies our design efforts, because most of the syntax is already specified. We can instead focus on building the compilation model and designing specific systems programming features. We also benefit from the tremendous work done on other languages (e.g. Clang, Rust, Swift, Julia, Zig, Nim, etc.), and leverage the MLIR compiler ecosystem. We also benefit from experience with the Swift programming language, which migrated most of a massive Objective-C community to a new language.
Further, we decided that the right long-term goal for Mojo is to provide a superset of Python (i.e. be compatible with existing programs) and to embrace the CPython immediately for long-tail ecosystem enablement. To a Python programmer, we expect and hope that Mojo will be immediately familiar, while also providing new tools for developing systems-level code that enable you to do things that Python falls back to C and C++ for. We aren’t trying to convince the world that “static is good” or “dynamic is good” - our belief is that both are good when used for the right applications, and that the language should enable the programmer to make the call. Further, we decided that the right long-term goal for Mojo is to provide a superset of Python (i.e. be compatible with existing programs) and to embrace the CPython immediately for long-tail ecosystem enablement. To a Python programmer, we expect and hope that Mojo will be immediately familiar, while also providing new tools for developing systems-level code that enable you to do things that Python falls back to C and C++ for. We aren’t trying to convince the world that "static is good" or "dynamic is good" - our belief is that both are good when used for the right applications, and that the language should enable the programmer to make the call.
How compatible is Mojo with Python really? How compatible is Mojo with Python really?
Mojo already supports many core features of Python including async/await, error handling, variadics, etc, but… it is still very early and missing many features - so today it isn’t very compatible. Mojo doesn’t even support classes yet! Mojo already supports many core features of Python, including async/await, error handling and variadics. However, since it is still in its early stages and missing many features, it is not yet very compatible with Python. In fact, Mojo doesn't even support classes yet!
That said, we have experience with two major but different compatibility journeys: the “Clang” compiler is a C, C++ and Objective-C (and CUDA, OpenCL, …) that is part of LLVM. A major goal of Clang was to be a “compatible replacement” for GCC, MSVC and other existing compilers. It is hard to make a direct comparison, but the complexity of the Clang problem appears to be an order of magnitude bigger than implementing a compatible replacement for Python. The journey there gives good confidence we can do this right for the Python community. That said, we have experience with two major but different compatibility journeys: the "Clang" compiler is a C, C++, and Objective-C (and CUDA, OpenCL, etc.) that is part of LLVM. A major goal of Clang was to be a "compatible replacement" for GCC, MSVC and other existing compilers. It is hard to make a direct comparison, but the complexity of the Clang problem appears to be an order of magnitude bigger than implementing a compatible replacement for Python. The journey there gives us good confidence that we can do this right for the Python community.
Another example is the Swift programming language, which embraced the Objective-C runtime and language ecosystem and progressively shifted millions of programmers (and huge amounts of code) incrementally over to a completely different programming language. With Swift, we learned lessons about how to be “run-time compatible” and cooperate with a legacy runtime. In the case of Python and Mojo, we expect Mojo to cooperate directly with the CPython runtime and have similar support for integrating with CPython classes and objects without having to compile the code itself. This will allow us to talk to a massive ecosystem of existing code, but provide a progressive migration approach where incremental work put in for migration will yield incremental benefit. Another example is the Swift programming language, which embraced the Objective-C runtime and language ecosystem and progressively shifted millions of programmers (and huge amounts of code) incrementally over to a completely different programming language. With Swift, we learned lessons about how to be "run-time compatible" and cooperate with a legacy runtime. In the case of Python and Mojo, we expect Mojo to cooperate directly with the CPython runtime and have similar support for integrating with CPython classes and objects without having to compile the code itself. This will allow us to talk to a massive ecosystem of existing code, but provide a progressive migration approach where incremental work put into migration will yield incremental benefits.
Overall, we believe that the north star of compatibility, continued vigilance on design, and incremental progress towards full compatibility will get us to where we need to be in time. Overall, we believe that the north star of compatibility, continued vigilance on design, and incremental progress towards full compatibility will get us to where we need to be in time.
Intentional differences from Python Intentional differences from Python
While compatibility and migratability are key to success, we also want Mojo to be a first class language on its own, and cannot be hobbled by not being able to introduce new keywords or add a few grammar productions. As such, our approach to compatibility is two fold: While compatibility and migratability are key to success, we also want Mojo to be a first class language on its own and not be hobbled by not being able to introduce new keywords or add a few grammar productions. As such, our approach to compatibility is twofold:
We utilize CPython to run all existing Python3 code “out of the box” without modification and use its runtime, unmodified, for full compatibility with the entire ecosystem. Running code this way will get no benefit from Mojo, but the sheer existence and availability of this ecosystem will rapidly accelerate the bring-up of Mojo and leverage the fact that Python is really great for high level programming already. We utilize CPython to run all existing Python3 code "out of the box" without modification and use its runtime, unmodified, for full compatibility with the entire ecosystem. Running code this way will get no benefit from Mojo, but the sheer existence and availability of this ecosystem will rapidly accelerate the bring-up of Mojo and leverage the fact that Python is really great for high level programming already.
We will provide a mechanical migrator that provides very good compatibility for people who want to move Python code to Mojo. For example, Mojo provides a backtick feature that allows use of any keyword as an identifier, providing a trivial mechanical migration path for code that uses those keywords as identifiers or keyword arguments. Code that migrates to Mojo can then utilize the advanced systems programming features. We will provide a mechanical migrator that provides very good compatibility for people who want to move Python code to Mojo. For example, Mojo provides a backtick feature that allows the use of any keyword as an identifier, providing a trivial mechanical migration path for code that uses those keywords as identifiers or keyword arguments. Code that migrates to Mojo can then utilize the advanced systems programming features.
Together, this allows Mojo to integrate well in a mostly-CPython world, but allows Mojo programmers to be able to progressively move code (a module or file at a time) to Mojo. This approach was used and proved by the Objective-C to Swift migration that Apple performed. Swift code is able to subclass and utilize Objective-C classes, and programmers were able to adopt Swift incrementally in their applications. Swift also supports building APIs that are useful for Objective-C programmers, and we expect Mojo to be a great way to implement APIs for CPython as well. Together, this allows Mojo to integrate well in a mostly-CPython world, but allows Mojo programmers to progressively move code (a module or file at a time) to Mojo. This approach was used and proved by the Objective-C to Swift migration that Apple performed. Swift code is able to subclass and utilize Objective-C classes, and programmers were able to adopt Swift incrementally in their applications. Swift also supports building APIs that are useful for Objective-C programmers, and we expect Mojo to be a great way to implement APIs for CPython as well.
It will take some time to build Mojo and the migration support, but we feel confident that this will allow us to focus our energies and avoid distractions. We also think the relationship with CPython can build from both directions - wouldn’t it be cool if the CPython team eventually reimplemented the interpreter in Mojo instead of C? 🔥 It will take some time to build Mojo and the migration support, but we feel confident that this will allow us to focus our energies and avoid distractions. We also think the relationship with CPython can build from both directions - wouldn't it be cool if the CPython team eventually reimplemented the interpreter in Mojo instead of C? 🔥
Detailed Motivation: Detailed Motivation:
Mojo started with the goal of bringing an innovative programming model to accelerators and other heterogeneous systems that are pervasive in machine learning. That said, one of the most important and prevalent “accelerators” is actually the host CPU. These CPUs are getting lots of tensor-core-like accelerator blocks and other dedicated AI acceleration units, but they also importantly serve as the “fall back” to support operations the accelerators don’t. This includes tasks like data loading, pre- and post-processing, and integrations with foreign systems written (e.g.) in C++. Mojo was initially created with the goal of bringing an innovative programming model to accelerators and other heterogeneous systems that are pervasive in machine learning. However, one of the most important and prevalent "accelerators" is actually the host CPU. These CPUs are getting lots of tensor-core-like accelerator blocks and other dedicated AI acceleration units, but they also play a vital role as a "fallback" to support operations the accelerators don’t. This includes tasks like data loading, pre-processing and post-processing, and integrations with foreign systems written, for example, in C++.
As such, it became clear that we couldn’t build a limited accelerator language that targets a narrow subset of the problem (e.g. just work for tensors). We needed to support the full gamut of general purpose programming. At the same time, we didn’t see a need to innovate in syntax or community, and so we decided to embrace and complete the Python ecosystem. As such, it became clear that we couldn't build a limited accelerator language that targets a narrow subset of the problem (e.g. just work for tensors). We needed to support the full gamut of general purpose programming. At the same time, we didn't see a need to innovate in syntax or community, and so we decided to embrace and complete the Python ecosystem.
Why Python? Why Python?
Python is the dominant force in both the field ML and also countless other fields. It is easy to learn, known by important cohorts of programmers (e.g. data scientists), has an amazing community, has tons of valuable packages, and has a wide variety of good tooling. Python supports development of beautiful and expressive APIs through its dynamic programming features, which led machine learning frameworks like TensorFlow and PyTorch embraced Python as a frontend to their high-performance runtimes implemented in C++. Python is the dominant force not only in the field of machine learning but also in countless other fields. It is easy to learn, known by important cohorts of programmers (e.g. data scientists), has an amazing community, has tons of valuable packages, and has a wide variety of good tooling. Python supports the development of beautiful and expressive APIs through its dynamic programming features, which led machine learning frameworks like TensorFlow and PyTorch to embrace Python as a frontend to their high-performance runtimes implemented in C++.
For Modular today, Python is a non-negotiable part of our API surface stack - this is dictated by our customers. Given that everything else in our stack is negotiable, it stands to reason that we should start from a “Python First” approach. For Modular today, Python is a non-negotiable part of our API surface stack - this is dictated by our customers. Given that everything else in our stack is negotiable, it stands to reason that we should start from a "Python First" approach.
More subjectively, we feel that Python is a beautiful language - designed with simple and composable abstractions, eschews needless punctuation that is redundant-in-practice with indentation, and built with powerful (dynamic) metaprogramming features that are a runway to extend to what we need for Modular. We hope that those in the Python ecosystem see our new direction as taking Python ahead to the next level - completing it - instead of trying to compete with it. More subjectively, we feel that Python is a beautiful language, designed with simple and composable abstractions, eschewing needless punctuation, which is redundant in practice when indentation is used. It is also built with powerful (dynamic) metaprogramming features that provide a runway to extend Python according to our needs for Modular. We hope that those in the Python ecosystem will see our new direction as a step forward, taking Python to the next level by completing it instead of competing with it.
What’s wrong with Python? What’s wrong with Python?
Python has well known problems - most obviously, poor low-level performance and CPython implementation decisions like the GIL. While there are many active projects underway to improve these challenges, the issues brought by Python go deeper and particularly impact the AI field. Instead of talking about those technical limitations, we'll talk about the implications of these limitations here in 2023. Python has well known problems, most obviously, poor low-level performance and CPython implementation decisions like the GIL. While many active projects are underway to improve these challenges, the issues brought by Python go deeper and particularly impact the AI field. Instead of talking about those technical limitations, we'll talk about the implications of these limitations here in 2023.
Note that everywhere we refer to Python in this section is referring to the CPython implementation. Well talk about other implementations in a bit. Note that every time we refer to Python in this section, we are referring to the CPython implementation. We will discuss other implementations shortly.
The two-world problem The two-world problem
For a variety of reasons, Python isn’t suitable for systems programming. Fortunately, Python has amazing strengths as a glue layer, and low-level bindings to C and C++ allow building libraries in C, C++ and many other languages with better performance characteristics. This is what has enabled things like numpy, TensorFlow and PyTorch and a vast number of other libraries in the ecosystem. For a variety of reasons, Python isn't suitable for systems programming. Fortunately, Python has amazing strengths as a glue layer, and low-level bindings to C and C++ allow building libraries in C, C++ and many other languages with better performance characteristics. This is what has enabled things like NumPy, TensorFlow, PyTorch and a vast number of other libraries in the ecosystem.
Unfortunately, while this approach is an effective way to building high performance Python libraries, its approach comes with a cost: building these hybrid libraries is very complicated, requiring low-level understanding of the internals of cpython, requires knowledge of C/C++/… programming (undermining one of the original goals of using Python in the first place), makes it difficult to evolve large frameworks, and (in the case of ML) pushes the world towards “graph based” programming models which have worse fundamental usability than “eager mode” systems. TensorFlow was an exemplar of this, but much of the effort in PyTorch 2 is focused around discovering graphs to enable more aggressive compilation methods. Unfortunately, while this approach is an effective way to build high performance Python libraries, it comes with a cost: building these hybrid libraries is very complicated. It requires a low-level understanding of the internals of CPython, knowledge of C/C++/… programming (undermining one of the original goals of using Python in the first place), makes it difficult to evolve large frameworks, and (in the case of ML) pushes the world towards "graph-based" programming models, which have worse fundamental usability than "eager mode" systems. TensorFlow was an exemplar of this, but much of the effort in PyTorch 2 is focused around discovering graphs to enable more aggressive compilation methods.
Beyond the fundamental nature of the two-world problem in terms of system complexity, it makes everything else in the ecosystem more complicated. Debuggers generally can’t step across Python and C code, and those that can aren’t widely accepted. It is a pain for the package ecosystems to deal C/C++ code instead of a single world. Projects like PyTorch with significant C++ investments are intentionally trying to move more of their codebase to Python because they know it gains usability. Beyond the fundamental nature of the two-world problem in terms of system complexity, it makes everything else in the ecosystem more complicated. Debuggers generally can’t step across Python and C code, and those that can aren’t widely accepted. It is a pain for the package ecosystems to deal with C/C++ code instead of a single world. Projects like PyTorch with significant C++ investments are intentionally trying to move more of their codebase to Python because they know it gains usability.
The three-world and N-world problem The three-world and N-world problem
The two-world problem is commonly felt across the Python ecosystem, but things are even worse for developers of machine learning frameworks. AI is pervasively accelerated, and those accelerators use bespoke programming languages like CUDA. While CUDA is a relative of C++, it has its own special problems and limitations, and does not have consistent tools like debuggers or profilers. It is also effectively locked to a single hardware maker! The two-world problem is commonly felt across the Python ecosystem, but things are even worse for developers of machine learning frameworks. AI is pervasively accelerated, and those accelerators use bespoke programming languages like CUDA. While CUDA is a relative of C++, it has its special problems and limitations and does not have consistent tools like debuggers or profilers. It is also effectively locked to a single hardware maker!
The AI world has an incredible amount of innovation on the hardware front, and as a consequence, complexity is spiraling out of control. There are now many attempts to build limited programming systems for accelerators (OpenCL, Sycl, OneAPI, …). This complexity explosion is continuing to increase and none of these systems solve the fundamental fragmentation in tools and ecosystem that is hurting the industry so badly. The AI world has an incredible amount of innovation on the hardware front, and as a consequence, complexity is spiraling out of control. There are now many attempts to build limited programming systems for accelerators (OpenCL, Sycl, OneAPI, …). This complexity explosion is continuing to increase and none of these systems solve the fundamental fragmentation in tools and the ecosystem that is hurting the industry so badly.
Mobile and server deployment Another challenge for the Python ecosystem is one of deployment. There are many facets to this, including folks who want to carefully control dependencies, some folks prefer to be able to deploy hermetically compiled “a.out” files, and multithreading and performance are also very important. These are areas where we would like to see the Python ecosystem take steps forward. Mobile and server deployment Another challenge for the Python ecosystem is one of deployment. There are many facets to this, including folks who want to carefully control dependencies, some folks prefer to be able to deploy hermetically compiled "a.out" files, and multithreading and performance are also very important. These are areas where we would like to see the Python ecosystem take steps forward.
Related work: other approaches to improve Python Related work: Other approaches to improve Python
There are many many approaches to improve Python, including recent work to speed up Python and replace the GIL, languages that look like Python but are subsets of it, and embedded DSLs that integrate with Python but that are not first class languages. While we cannot do an exhaustive list of all the efforts, we can talk about some of the challenges in these areas, and why they aren’t suitable for Modular’s use. There are many approaches to improve Python, including recent work to speed up Python and replace the GIL, languages that look like Python but are subsets of it, and embedded DSLs that integrate with Python but are not first class languages. While we cannot provide an exhaustive list of all the efforts, we can discuss some of the challenges in these areas and why they are not suitable for Modular's use.
Improving CPython and JIT compiling Python Improving CPython and JIT compiling Python
Recently, significant energy has been put into improving CPython performance and other implementation issues, and this is showing huge results for the community. This work is fantastic because it incrementally improves the current CPython implementation. Python 3.11 has delivered improvements of 10-60% faster than Python 3.10 through internal improvements, and Python 3.12 aims to go further with a trace optimizer. Many other projects are attempting to tame the GIL, and projects like PyPy (among many others) have used JIT compilation and tracing approaches to speed up Python. Recently, significant effort has been put into improving CPython performance and other implementation issues, resulting in huge benefits for the community. This work is fantastic because it incrementally improves the current CPython implementation. Python 3.11 has delivered performance improvements of 10-60% over Python 3.10 through internal improvements, and Python 3.12 aims to go further with a trace optimizer. Many other projects are attempting to tame the GIL, and projects like PyPy (among many others) have used JIT compilation and tracing approaches to speed up Python.
These are great efforts, but are not helpful in getting a unified language onto an accelerator. Many accelerators these days only support very limited dynamic features, or do so with terrible performance. Furthermore, systems programmers don’t just seek “performance” they also typically want a lot of “predictability and control” over how a computation happens. While these are great efforts, they are not helpful in getting a unified language onto an accelerator. Many accelerators these days only support very limited dynamic features, or do so with terrible performance. Furthermore, systems programmers don't just seek "performance" they also typically want a lot of "predictability and control" over how a computation happens.
While we are a fan of these approaches, and feel they are valuable and exciting to the community, they unfortunately do not satisfy our needs. We are looking to eliminate the need to use C or C++ within Python libraries, we seek the highest performance possible, and we cannot accept dynamic features at all in some cases, so these approaches don’t help. While we are fans of these approaches and feel they are valuable and exciting to the community, they unfortunately do not satisfy our needs. We are looking to eliminate the need to use C or C++ within Python libraries, seek the highest possible performance, and cannot accept dynamic features at all in some cases. Therefore, these approaches do not help.
Python subsets and other Python-like languages Python subsets and other Python-like languages
There are many attempts to build a “deployable” Python, one example is TorchScript from the PyTorch project. These are useful in that they often provide low-dependence deployment solutions and sometimes have high performance. Because they use Python-like syntax, they can be easier to learn than a novel language. There are many attempts to build a "deployable" Python, such as TorchScript from the PyTorch project. These are useful because they often provide low-dependence deployment solutions and sometimes have high performance. Because they use Python-like syntax, they can be easier to learn than a novel language.
On the other hand, these languages have not seen wide adoption - because they are a subset, they generally don’t interoperate with the Python ecosystem, do not have fantastic tooling (e.g. debuggers), and often change out inconvenient behavior in Python unilaterally, which breaks compatibility and fragments the ecosystem. For example, many of these change the behavior of simple integers to wrap instead of producing Python-compatible math. On the other hand, these languages have not seen wide adoption - because they are a subset, they generally do not interoperate with the Python ecosystem, do not have fantastic tooling (e.g. debuggers), and often unilaterally change inconvenient behavior in Python, which breaks compatibility and fragments the ecosystem. For example, many of these change the behavior of simple integers to wrap instead of producing Python-compatible math.
The challenges with these approaches is that they attempt to solve a weak point of Python, but aren’t as good at Python’s strong points. At best, these can provide a new alternative to C and C++ but without solving the dynamic use cases of Python they cannot solve the “two world problem”. This approach drives fragmentation, and incompatibility makes migration difficult to impossible - recall how challenging the Python 2 to Python 3 migration was. The challenge with these approaches is that they attempt to solve a weak point of Python, but are not as good at Python's strong points. At best, they can provide a new alternative to C and C++ - but without solving the dynamic use cases of Python, they cannot solve the "two world problem". This approach drives fragmentation, and incompatibility makes migration difficult to impossible - recall how challenging the Python 2 to Python 3 migration was.
Embedded DSLs in Python Embedded DSLs in Python
Another common approach is to build an embedded DSL in Python, typically installed with a Python decorator. There are many examples of this, e.g. the @tf.function decorator in TensorFlow, the @triton.jit in OpenAI’s Triton programming model, etc. A major benefit of these systems is that they maintain compatibility with all of the Python ecosystem tooling, and integrate natively into Python logic, allowing an embedded mini language to co-exist with the strengths of Python for dynamic use cases. Another common approach is to build an embedded DSL in Python, typically installed with a Python decorator. There are many examples of this, such as the @tf.function decorator in TensorFlow, the @triton.jit in OpenAI's Triton programming model, etc. A major benefit of these systems is that they maintain compatibility with all of the Python ecosystem tooling, and integrate natively into Python logic, allowing an embedded mini language to co-exist with the strengths of Python for dynamic use cases.
Unfortunately, the embedded mini-languages provided by these systems often have surprising limitations, don’t integrate well with debuggers and other workflow tooling, and do not support the level of native language integration that we seek for a language that unifies heterogeneous compute and is the primary way to write large scale kernels and systems. We hope to move the usability of the overall system forward by simplifying things and making it more consistent. Embedded DSLs are an expedient way to get demos up and running, but we are willing to put in the additional effort and work to provide better usability and predictability for our use-case. Unfortunately, the embedded mini-languages provided by these systems often have surprising limitations, don't integrate well with debuggers and other workflow tooling, and do not support the level of native language integration that we seek for a language that unifies heterogeneous compute and is the primary way to write large scale kernels and systems. We hope to move the usability of the overall system forward by simplifying things and making it more consistent. Embedded DSLs are an expedient way to get demos up and running, but we are willing to put in the additional effort and work to provide better usability and predictability for our use case.
lattner commented 1 year ago

@scottamain is working on improving this, Scott can you take a look?

scottamain commented 1 year ago

Wow thanks for the detailed suggestions, @aviolaris! Yeah I just finished a copyedit on this but I'll be sure these are addressed.

aviolaris commented 1 year ago

Nice! Thanks for the feedback. You're most welcome!

lattner commented 1 year ago

Is this done @scottamain ? If so, plz close whenever convenient just to keep the tracker tidy. thx

scottamain commented 1 year ago

Yep! Updates will go online soon.