[Documentation] Improvements for "Why Mojo?" page.

URL to the documentation page:

https://docs.modular.com/mojo/why-mojo.html

Proposed modifications:

I have made some modifications, including a few that have already been reported by Elliot Waite in a previous issue, that you may want to incorporate. As there is no versioning yet, I have converted the comparison results into markdown format for easier tracking.

modularml / mojo

[Documentation] Improvements for "Why Mojo?" page. #44

URL to the documentation page:

Proposed modifications:

Original Text	Modified Text
When we started Modular, we had no ~~intentions~~ of building a new programming language. But as we were building our platform with the intent to unify the ~~world’s~~ ML/AI infrastructure, we realized that programming across the entire stack was too complicated. ~~Plus,~~ we were writing a lot of MLIR by hand and not having a good time.	When we started Modular, we had no intention of building a new programming language. But as we were building our platform with the intent to unify the world's ML/AI infrastructure, we realized that programming across the entire stack was too complicated. Additionally, we were writing a lot of MLIR by hand and not having a good time.
What we wanted was an innovative and scalable programming model that could target accelerators and other heterogeneous systems that are pervasive in machine learning. This meant a programming language with powerful compile-time metaprogramming, integration of adaptive compilation techniques, caching throughout the compilation flow, and other ~~things~~ that are not supported by existing languages.	What we wanted was an innovative and scalable programming model that could target accelerators and other heterogeneous systems that are pervasive in machine learning. This meant a programming language with powerful compile-time metaprogramming, integration of adaptive compilation techniques, caching throughout the compilation flow, and other features that are not supported by existing languages.
And although accelerators are important, one of the most prevalent and sometimes overlooked ~~“accelerators”~~ is the host CPU. ~~Today,~~ CPUs have ~~lots~~ of tensor-core-like accelerator blocks and other AI acceleration units, but they also serve as the ~~“fall~~ ~~back”~~ for operations that specialized accelerators ~~don’t~~ handle, such as data loading, ~~pre-~~ and post-processing, and integrations with foreign systems. So it was clear that we ~~couldn’t~~ lift AI with an ~~“accelerator~~ ~~language”~~ that worked with ~~only~~ specific processors.	And although accelerators are important, one of the most prevalent and sometimes overlooked "accelerators" is the host CPU. Nowadays, CPUs have numerous tensor-core-like accelerator blocks and other AI acceleration units, but they also serve as the "fallback" for operations that specialized accelerators don't handle, such as data loading, pre-processing and post-processing, and integrations with foreign systems. Therefore, it was clear that we couldn't lift AI with an "accelerator language" that only worked with specific processors.
Applied AI systems need to address all these ~~issues~~ and we decided there was no reason it ~~couldn’t~~ be done with just one language. So Mojo was born.	Applied AI systems need to address all these issues, and we decided there was no reason it couldn't be done with just one language. Hence, Mojo was born.
We decided that our mission for Mojo would include innovations in compiler internals and support for current and emerging accelerators, but we ~~didn’t~~ ~~see~~ ~~any~~ need to innovate in language syntax or community. So we chose to embrace the Python ecosystem because it is so widely used, it is loved by the AI ~~ecosystem,~~ and ~~because~~ it is really nice!	We decided that our mission for Mojo would include innovations in compiler internals and support for current and emerging accelerators, but we saw no need to innovate in language syntax or community. Thus, we chose to embrace the Python ecosystem because it is widely used, it is loved by the AI community and it is really nice!
Mojo as a member of the Python family	Mojo as a member of the Python family
The Mojo language has lofty ~~goals~~ - we want full compatibility with the Python ecosystem, we ~~would~~ ~~like~~ predictable low-level performance and low-level ~~control,~~ ~~and~~ we need the ability to deploy subsets of code to ~~accelerators.~~ We ~~also~~ don’t want ecosystem ~~fragmentation~~ - we hope that people find our work to be useful over ~~time,~~ and don’t want something like the Python 2 => Python 3 migration to happen again. These are no small goals!	The Mojo language has lofty goals. We want full compatibility with the Python ecosystem, predictable low-level performance and low-level control. We need the ability to deploy subsets of code to accelerators, and we don’t want ecosystem fragmentation. We hope that people find our work useful over time and don’t want something like the Python 2 => Python 3 migration to happen again. These are not small goals!
Fortunately, while Mojo is a brand new code base, we aren’t really starting from scratch conceptually. Embracing Python massively simplifies our design efforts, because most of the syntax is already specified. We can instead focus ~~our~~ ~~efforts~~ on building the compilation model and designing specific systems programming features. We also benefit from tremendous work on other languages (e.g. Clang, Rust, Swift, Julia, Zig, Nim, ~~etc),~~ and leverage the MLIR compiler ecosystem. We also benefit from experience with the Swift programming language, which migrated most of a massive Objective-C community ~~over~~ to a new language.	Fortunately, while Mojo is a brand new code base, we aren’t really starting from scratch conceptually. Embracing Python massively simplifies our design efforts, because most of the syntax is already specified. We can instead focus on building the compilation model and designing specific systems programming features. We also benefit from the tremendous work done on other languages (e.g. Clang, Rust, Swift, Julia, Zig, Nim, etc.), and leverage the MLIR compiler ecosystem. We also benefit from experience with the Swift programming language, which migrated most of a massive Objective-C community to a new language.
Further, we decided that the right long-term goal for Mojo is to provide a superset of Python (i.e. be compatible with existing programs) and to embrace the CPython immediately for long-tail ecosystem enablement. To a Python programmer, we expect and hope that Mojo will be immediately familiar, while also providing new tools for developing systems-level code that enable you to do things that Python falls back to C and C++ for. We aren’t trying to convince the world that ~~“static~~ is ~~good”~~ or ~~“dynamic~~ is ~~good”~~ - our belief is that both are good when used for the right applications, and that the language should enable the programmer to make the call.	Further, we decided that the right long-term goal for Mojo is to provide a superset of Python (i.e. be compatible with existing programs) and to embrace the CPython immediately for long-tail ecosystem enablement. To a Python programmer, we expect and hope that Mojo will be immediately familiar, while also providing new tools for developing systems-level code that enable you to do things that Python falls back to C and C++ for. We aren’t trying to convince the world that "static is good" or "dynamic is good" - our belief is that both are good when used for the right applications, and that the language should enable the programmer to make the call.
How compatible is Mojo with Python really?	How compatible is Mojo with Python really?
Mojo already supports many core features of ~~Python~~ including async/await, error handling, ~~variadics,~~ ~~etc,~~ ~~but…~~ it is still ~~very~~ early and missing many ~~features~~ - so ~~today~~ it ~~isn’t~~ very ~~compatible.~~ Mojo ~~doesn’t~~ even support classes yet!	Mojo already supports many core features of Python, including async/await, error handling and variadics. However, since it is still in its early stages and missing many features, it is not yet very compatible with Python. In fact, Mojo doesn't even support classes yet!
That said, we have experience with two major but different compatibility journeys: the ~~“Clang”~~ compiler is a C, ~~C++~~ and Objective-C (and CUDA, OpenCL, …) that is part of LLVM. A major goal of Clang was to be a ~~“compatible~~ ~~replacement”~~ for GCC, MSVC and other existing compilers. It is hard to make a direct comparison, but the complexity of the Clang problem appears to be an order of magnitude bigger than implementing a compatible replacement for Python. The journey there gives good confidence we can do this right for the Python community.	That said, we have experience with two major but different compatibility journeys: the "Clang" compiler is a C, C++, and Objective-C (and CUDA, OpenCL, etc.) that is part of LLVM. A major goal of Clang was to be a "compatible replacement" for GCC, MSVC and other existing compilers. It is hard to make a direct comparison, but the complexity of the Clang problem appears to be an order of magnitude bigger than implementing a compatible replacement for Python. The journey there gives us good confidence that we can do this right for the Python community.
Another example is the Swift programming language, which embraced the Objective-C runtime and language ecosystem and progressively shifted millions of programmers (and huge amounts of code) incrementally over to a completely different programming language. With Swift, we learned lessons about how to be ~~“run-time~~ ~~compatible”~~ and cooperate with a legacy runtime. In the case of Python and Mojo, we expect Mojo to cooperate directly with the CPython runtime and have similar support for integrating with CPython classes and objects without having to compile the code itself. This will allow us to talk to a massive ecosystem of existing code, but provide a progressive migration approach where incremental work put in ~~for~~ migration will yield incremental ~~benefit~~.	Another example is the Swift programming language, which embraced the Objective-C runtime and language ecosystem and progressively shifted millions of programmers (and huge amounts of code) incrementally over to a completely different programming language. With Swift, we learned lessons about how to be "run-time compatible" and cooperate with a legacy runtime. In the case of Python and Mojo, we expect Mojo to cooperate directly with the CPython runtime and have similar support for integrating with CPython classes and objects without having to compile the code itself. This will allow us to talk to a massive ecosystem of existing code, but provide a progressive migration approach where incremental work put into migration will yield incremental benefits.
Overall, we believe that the north star of compatibility, continued vigilance on design, and incremental progress towards full compatibility will get us to where we need to be in time.	Overall, we believe that the north star of compatibility, continued vigilance on design, and incremental progress towards full compatibility will get us to where we need to be in time.
Intentional differences from Python	Intentional differences from Python
While compatibility and migratability are key to success, we also want Mojo to be a first class language on its ~~own,~~ and ~~cannot~~ be hobbled by not being able to introduce new keywords or add a few grammar productions. As such, our approach to compatibility is ~~two~~ ~~fold:~~	While compatibility and migratability are key to success, we also want Mojo to be a first class language on its own and not be hobbled by not being able to introduce new keywords or add a few grammar productions. As such, our approach to compatibility is twofold:
We utilize CPython to run all existing Python3 code ~~“out~~ of the ~~box”~~ without modification and use its runtime, unmodified, for full compatibility with the entire ecosystem. Running code this way will get no benefit from Mojo, but the sheer existence and availability of this ecosystem will rapidly accelerate the bring-up of Mojo and leverage the fact that Python is really great for high level programming already.	We utilize CPython to run all existing Python3 code "out of the box" without modification and use its runtime, unmodified, for full compatibility with the entire ecosystem. Running code this way will get no benefit from Mojo, but the sheer existence and availability of this ecosystem will rapidly accelerate the bring-up of Mojo and leverage the fact that Python is really great for high level programming already.
We will provide a mechanical migrator that provides very good compatibility for people who want to move Python code to Mojo. For example, Mojo provides a backtick feature that allows use of any keyword as an identifier, providing a trivial mechanical migration path for code that uses those keywords as identifiers or keyword arguments. Code that migrates to Mojo can then utilize the advanced systems programming features.	We will provide a mechanical migrator that provides very good compatibility for people who want to move Python code to Mojo. For example, Mojo provides a backtick feature that allows the use of any keyword as an identifier, providing a trivial mechanical migration path for code that uses those keywords as identifiers or keyword arguments. Code that migrates to Mojo can then utilize the advanced systems programming features.
Together, this allows Mojo to integrate well in a mostly-CPython world, but allows Mojo programmers to be ~~able~~ to progressively move code (a module or file at a time) to Mojo. This approach was used and proved by the Objective-C to Swift migration that Apple performed. Swift code is able to subclass and utilize Objective-C classes, and programmers were able to adopt Swift incrementally in their applications. Swift also supports building APIs that are useful for Objective-C programmers, and we expect Mojo to be a great way to implement APIs for CPython as well.	Together, this allows Mojo to integrate well in a mostly-CPython world, but allows Mojo programmers to progressively move code (a module or file at a time) to Mojo. This approach was used and proved by the Objective-C to Swift migration that Apple performed. Swift code is able to subclass and utilize Objective-C classes, and programmers were able to adopt Swift incrementally in their applications. Swift also supports building APIs that are useful for Objective-C programmers, and we expect Mojo to be a great way to implement APIs for CPython as well.
It will take some time to build Mojo and the migration support, but we feel confident that this will allow us to focus our energies and avoid distractions. We also think the relationship with CPython can build from both directions - ~~wouldn’t~~ it be cool if the CPython team eventually reimplemented the interpreter in Mojo instead of C? 🔥	It will take some time to build Mojo and the migration support, but we feel confident that this will allow us to focus our energies and avoid distractions. We also think the relationship with CPython can build from both directions - wouldn't it be cool if the CPython team eventually reimplemented the interpreter in Mojo instead of C? 🔥
Detailed Motivation:	Detailed Motivation:
Mojo ~~started~~ with the goal of bringing an innovative programming model to accelerators and other heterogeneous systems that are pervasive in machine learning. ~~That~~ ~~said,~~ one of the most important and prevalent ~~“accelerators”~~ is actually the host CPU. These CPUs are getting lots of tensor-core-like accelerator blocks and other dedicated AI acceleration units, but they also ~~importantly~~ ~~serve~~ as ~~the~~ ~~“fall~~ ~~back”~~ to support operations the accelerators don’t. This includes tasks like data loading, ~~pre-~~ and post-processing, and integrations with foreign systems ~~written~~ ~~(e.g.)~~ in C++.	Mojo was initially created with the goal of bringing an innovative programming model to accelerators and other heterogeneous systems that are pervasive in machine learning. However, one of the most important and prevalent "accelerators" is actually the host CPU. These CPUs are getting lots of tensor-core-like accelerator blocks and other dedicated AI acceleration units, but they also play a vital role as a "fallback" to support operations the accelerators don’t. This includes tasks like data loading, pre-processing and post-processing, and integrations with foreign systems written, for example, in C++.
As such, it became clear that we ~~couldn’t~~ build a limited accelerator language that targets a narrow subset of the problem (e.g. just work for tensors). We needed to support the full gamut of general purpose programming. At the same time, we ~~didn’t~~ see a need to innovate in syntax or community, and so we decided to embrace and complete the Python ecosystem.	As such, it became clear that we couldn't build a limited accelerator language that targets a narrow subset of the problem (e.g. just work for tensors). We needed to support the full gamut of general purpose programming. At the same time, we didn't see a need to innovate in syntax or community, and so we decided to embrace and complete the Python ecosystem.
Why Python?	Why Python?
Python is the dominant force in ~~both~~ the field ML ~~and~~ also countless other fields. It is easy to learn, known by important cohorts of programmers (e.g. data scientists), has an amazing community, has tons of valuable packages, and has a wide variety of good tooling. Python supports development of beautiful and expressive APIs through its dynamic programming features, which led machine learning frameworks like TensorFlow and PyTorch ~~embraced~~ Python as a frontend to their high-performance runtimes implemented in C++.	Python is the dominant force not only in the field of machine learning but also in countless other fields. It is easy to learn, known by important cohorts of programmers (e.g. data scientists), has an amazing community, has tons of valuable packages, and has a wide variety of good tooling. Python supports the development of beautiful and expressive APIs through its dynamic programming features, which led machine learning frameworks like TensorFlow and PyTorch to embrace Python as a frontend to their high-performance runtimes implemented in C++.
For Modular today, Python is a non-negotiable part of our API surface stack - this is dictated by our customers. Given that everything else in our stack is negotiable, it stands to reason that we should start from a ~~“Python~~ ~~First”~~ approach.	For Modular today, Python is a non-negotiable part of our API surface stack - this is dictated by our customers. Given that everything else in our stack is negotiable, it stands to reason that we should start from a "Python First" approach.
More subjectively, we feel that Python is a beautiful ~~language~~ - designed with simple and composable abstractions, ~~eschews~~ needless ~~punctuation~~ ~~that~~ is ~~redundant-in-practice~~ ~~with~~ ~~indentation,~~ ~~and~~ built with powerful (dynamic) metaprogramming features that ~~are~~ a runway to extend to ~~what~~ we ~~need~~ for Modular. We hope that those in the Python ecosystem see our new direction as taking Python ~~ahead~~ to the next level - completing it - instead of ~~trying~~ to ~~compete~~ with it.	More subjectively, we feel that Python is a beautiful language, designed with simple and composable abstractions, eschewing needless punctuation, which is redundant in practice when indentation is used. It is also built with powerful (dynamic) metaprogramming features that provide a runway to extend Python according to our needs for Modular. We hope that those in the Python ecosystem will see our new direction as a step forward, taking Python to the next level by completing it instead of competing with it.
What’s wrong with Python?	What’s wrong with Python?
Python has well known ~~problems~~ - most obviously, poor low-level performance and CPython implementation decisions like the GIL. While ~~there~~ ~~are~~ many active projects underway to improve these challenges, the issues brought by Python go deeper and particularly impact the AI field. Instead of talking about those technical limitations, we'll talk about the implications of these limitations here in 2023.	Python has well known problems, most obviously, poor low-level performance and CPython implementation decisions like the GIL. While many active projects are underway to improve these challenges, the issues brought by Python go deeper and particularly impact the AI field. Instead of talking about those technical limitations, we'll talk about the implications of these limitations here in 2023.
Note that ~~everywhere~~ we refer to Python in this ~~section~~ is referring to the CPython implementation. We~~ll talk about~~ other implementations ~~in a bit~~.	Note that every time we refer to Python in this section, we are referring to the CPython implementation. We will discuss other implementations shortly.
The two-world problem	The two-world problem
For a variety of reasons, Python ~~isn’t~~ suitable for systems programming. Fortunately, Python has amazing strengths as a glue layer, and low-level bindings to C and C++ allow building libraries in C, C++ and many other languages with better performance characteristics. This is what has enabled things like ~~numpy,~~ ~~TensorFlow~~ ~~and~~ PyTorch and a vast number of other libraries in the ecosystem.	For a variety of reasons, Python isn't suitable for systems programming. Fortunately, Python has amazing strengths as a glue layer, and low-level bindings to C and C++ allow building libraries in C, C++ and many other languages with better performance characteristics. This is what has enabled things like NumPy, TensorFlow, PyTorch and a vast number of other libraries in the ecosystem.
Unfortunately, while this approach is an effective way to ~~building~~ high performance Python libraries, ~~its~~ ~~approach~~ comes with a cost: building these hybrid libraries is very ~~complicated,~~ ~~requiring~~ low-level understanding of the internals of ~~cpython,~~ ~~requires~~ knowledge of C/C++/… programming (undermining one of the original goals of using Python in the first place), makes it difficult to evolve large frameworks, and (in the case of ML) pushes the world towards ~~“graph~~ ~~based”~~ programming ~~models~~ which have worse fundamental usability than ~~“eager~~ ~~mode”~~ systems. TensorFlow was an exemplar of this, but much of the effort in PyTorch 2 is focused around discovering graphs to enable more aggressive compilation methods.	Unfortunately, while this approach is an effective way to build high performance Python libraries, it comes with a cost: building these hybrid libraries is very complicated. It requires a low-level understanding of the internals of CPython, knowledge of C/C++/… programming (undermining one of the original goals of using Python in the first place), makes it difficult to evolve large frameworks, and (in the case of ML) pushes the world towards "graph-based" programming models, which have worse fundamental usability than "eager mode" systems. TensorFlow was an exemplar of this, but much of the effort in PyTorch 2 is focused around discovering graphs to enable more aggressive compilation methods.
Beyond the fundamental nature of the two-world problem in terms of system complexity, it makes everything else in the ecosystem more complicated. Debuggers generally can’t step across Python and C code, and those that can aren’t widely accepted. It is a pain for the package ecosystems to deal C/C++ code instead of a single world. Projects like PyTorch with significant C++ investments are intentionally trying to move more of their codebase to Python because they know it gains usability.	Beyond the fundamental nature of the two-world problem in terms of system complexity, it makes everything else in the ecosystem more complicated. Debuggers generally can’t step across Python and C code, and those that can aren’t widely accepted. It is a pain for the package ecosystems to deal with C/C++ code instead of a single world. Projects like PyTorch with significant C++ investments are intentionally trying to move more of their codebase to Python because they know it gains usability.
The three-world and N-world problem	The three-world and N-world problem
The two-world problem is commonly felt across the Python ecosystem, but things are even worse for developers of machine learning frameworks. AI is pervasively accelerated, and those accelerators use bespoke programming languages like CUDA. While CUDA is a relative of C++, it has its ~~own~~ special problems and ~~limitations,~~ and does not have consistent tools like debuggers or profilers. It is also effectively locked to a single hardware maker!	The two-world problem is commonly felt across the Python ecosystem, but things are even worse for developers of machine learning frameworks. AI is pervasively accelerated, and those accelerators use bespoke programming languages like CUDA. While CUDA is a relative of C++, it has its special problems and limitations and does not have consistent tools like debuggers or profilers. It is also effectively locked to a single hardware maker!
The AI world has an incredible amount of innovation on the hardware front, and as a consequence, complexity is spiraling out of control. There are now many attempts to build limited programming systems for accelerators (OpenCL, Sycl, OneAPI, …). This complexity explosion is continuing to increase and none of these systems solve the fundamental fragmentation in tools and ecosystem that is hurting the industry so badly.	The AI world has an incredible amount of innovation on the hardware front, and as a consequence, complexity is spiraling out of control. There are now many attempts to build limited programming systems for accelerators (OpenCL, Sycl, OneAPI, …). This complexity explosion is continuing to increase and none of these systems solve the fundamental fragmentation in tools and the ecosystem that is hurting the industry so badly.
Mobile and server deployment Another challenge for the Python ecosystem is one of deployment. There are many facets to this, including folks who want to carefully control dependencies, some folks prefer to be able to deploy hermetically compiled ~~“a.out”~~ files, and multithreading and performance are also very important. These are areas where we would like to see the Python ecosystem take steps forward.	Mobile and server deployment Another challenge for the Python ecosystem is one of deployment. There are many facets to this, including folks who want to carefully control dependencies, some folks prefer to be able to deploy hermetically compiled "a.out" files, and multithreading and performance are also very important. These are areas where we would like to see the Python ecosystem take steps forward.
Related work: ~~other~~ approaches to improve Python	Related work: Other approaches to improve Python
There are ~~many~~ many approaches to improve Python, including recent work to speed up Python and replace the GIL, languages that look like Python but are subsets of it, and embedded DSLs that integrate with Python but ~~that~~ are not first class languages. While we cannot do an exhaustive list of all the efforts, we can ~~talk~~ ~~about~~ some of the challenges in these ~~areas,~~ and why they ~~aren’t~~ suitable for ~~Modular’s~~ use.	There are many approaches to improve Python, including recent work to speed up Python and replace the GIL, languages that look like Python but are subsets of it, and embedded DSLs that integrate with Python but are not first class languages. While we cannot provide an exhaustive list of all the efforts, we can discuss some of the challenges in these areas and why they are not suitable for Modular's use.