Closed aviolaris closed 1 year ago
@scottamain is working on improving this, Scott can you take a look?
Wow thanks for the detailed suggestions, @aviolaris! Yeah I just finished a copyedit on this but I'll be sure these are addressed.
Nice! Thanks for the feedback. You're most welcome!
Is this done @scottamain ? If so, plz close whenever convenient just to keep the tracker tidy. thx
Yep! Updates will go online soon.
URL to the documentation page:
https://docs.modular.com/mojo/why-mojo.html
Proposed modifications:
I have made some modifications, including a few that have already been reported by Elliot Waite in a previous issue, that you may want to incorporate. As there is no versioning yet, I have converted the comparison results into markdown format for easier tracking.
intentionsof building a new programming language. But as we were building our platform with the intent to unify theworld’sML/AI infrastructure, we realized that programming across the entire stack was too complicated.Plus,we were writing a lot of MLIR by hand and not having a good time.thingsthat are not supported by existing languages.“accelerators”is the host CPU.Today,CPUs havelotsoftensor-core-like accelerator blocks and other AI acceleration units, but they also serve as the“fallback”for operations that specialized acceleratorsdon’thandle, such as data loading,pre-and post-processing, and integrations with foreign systems.Soit was clear that wecouldn’tlift AI with an“acceleratorlanguage”that worked withonlyspecific processors.issuesand we decided there was no reason itcouldn’tbe done with just one language.SoMojo was born.didn’tseeanyneed to innovate in language syntax or community.Sowe chose to embrace the Python ecosystem because it issowidely used, it is loved by the AIecosystem,andbecauseit is really nice!goals-wewant full compatibility with the Python ecosystem,wewouldlikepredictable low-level performance and low-levelcontrol,andweneed the ability to deploy subsets of code toaccelerators.Wealsodon’t want ecosystemfragmentation-wehope that people find our worktobeuseful overtime,and don’t want something like the Python 2 => Python 3 migration to happen again. These arenosmall goals!oureffortson building the compilation model and designing specific systems programming features. We also benefit from tremendous work on other languages (e.g. Clang, Rust, Swift, Julia, Zig, Nim,etc),and leverage the MLIR compiler ecosystem. We also benefit from experience with the Swift programming language, which migrated most of a massive Objective-C communityoverto a new language.“staticisgood”or“dynamicisgood”- our belief is that both are good when used for the right applications, and that the language should enable the programmer to make the call.Pythonincluding async/await, error handling,variadics,etc,but…it is stillveryearly and missing manyfeatures-sotodayitisn’tverycompatible.Mojodoesn’teven support classes yet!“Clang”compiler is a C,C++and Objective-C (and CUDA, OpenCL,…)that is part of LLVM. A major goal of Clang was to be a“compatiblereplacement”for GCC, MSVC and other existing compilers. It is hard to make a direct comparison, but the complexity of the Clang problem appears to be an order of magnitude bigger than implementing a compatible replacement for Python. The journey there gives good confidence we can do this right for the Python community.“run-timecompatible”and cooperate with a legacy runtime. In the case of Python and Mojo, we expect Mojo to cooperate directly with the CPython runtime and have similar support for integrating with CPython classes and objects without having to compile the code itself. This will allow us to talk to a massive ecosystem of existing code, but provide a progressive migration approach where incremental work putinformigration will yield incrementalbenefit.own,andcannotbe hobbled by not being able to introduce new keywords or add a few grammar productions. As such, our approach to compatibility istwofold:“outof thebox”without modification and use its runtime, unmodified, for full compatibility with the entire ecosystem. Running code this way will get no benefit from Mojo, but the sheer existence and availability of this ecosystem will rapidly accelerate the bring-up of Mojo and leverage the fact that Python is really great for high level programming already.tobeableto progressively move code (a module or file at a time) to Mojo. This approach was used and proved by the Objective-C to Swift migration that Apple performed. Swift code is able to subclass and utilize Objective-C classes, and programmers were able to adopt Swift incrementally in their applications. Swift also supports building APIs that are useful for Objective-C programmers, and we expect Mojo to be a great way to implement APIs for CPython as well.wouldn’tit be cool if the CPython team eventually reimplemented the interpreter in Mojo instead of C? 🔥startedwith the goal of bringing an innovative programming model to accelerators and other heterogeneous systems that are pervasive in machine learning.Thatsaid,one of the most important and prevalent“accelerators”is actually the host CPU. These CPUs are getting lots of tensor-core-like accelerator blocks and other dedicated AI acceleration units, but they alsoimportantlyserveasthe“fallback”to support operations the accelerators don’t. This includes tasks like data loading,pre-and post-processing, and integrations with foreign systemswritten(e.g.)in C++.couldn’tbuild a limited accelerator language that targets a narrow subset of the problem (e.g. just work for tensors). We needed to support the full gamut of general purpose programming. At the same time, wedidn’tsee a need to innovate in syntax or community, and so we decided to embrace and complete the Python ecosystem.boththe fieldMLandalso countless other fields. It is easy to learn, known by important cohorts of programmers (e.g. data scientists), has an amazing community, has tons of valuable packages, and has a wide variety of good tooling. Python supports development of beautiful and expressive APIs through its dynamic programming features, which led machine learning frameworks like TensorFlow and PyTorchembracedPython as a frontend to their high-performance runtimes implemented in C++.“PythonFirst”approach.language-designed with simple and composable abstractions,eschewsneedlesspunctuationthatisredundant-in-practicewithindentation,andbuilt with powerful (dynamic) metaprogramming features thatarea runway to extend towhatweneedfor Modular. We hope that those in the Python ecosystem see our new direction as taking Pythonaheadto the next level-completing it-instead oftryingtocompetewith it.problems-most obviously, poor low-level performance and CPython implementation decisions like the GIL. Whiletherearemany active projects underway to improve these challenges, the issues brought by Python go deeper and particularly impact the AI field. Instead of talking about those technical limitations, we'll talk about the implications of these limitations here in 2023.everywherewe refer to Python in thissectionisreferring to the CPython implementation. Well talk aboutother implementationsin a bit.isn’tsuitable for systems programming. Fortunately, Python has amazing strengths as a glue layer, and low-level bindings to C and C++ allow building libraries in C, C++ and many other languages with better performance characteristics. This is what has enabled things likenumpy,TensorFlowandPyTorch and a vast number of other libraries in the ecosystem.buildinghigh performance Python libraries,itsapproachcomes with a cost: building these hybrid libraries is verycomplicated,requiringlow-level understanding of the internals ofcpython,requiresknowledge of C/C++/… programming (undermining one of the original goals of using Python in the first place), makes it difficult to evolve large frameworks, and (in the case of ML) pushes the world towards“graphbased”programmingmodelswhich have worse fundamental usability than“eagermode”systems. TensorFlow was an exemplar of this, but much of the effort in PyTorch 2 is focused around discovering graphs to enable more aggressive compilation methods.ownspecial problems andlimitations,and does not have consistent tools like debuggers or profilers. It is also effectively locked to a single hardware maker!“a.out”files, and multithreading and performance are also very important. These are areas where we would like to see the Python ecosystem take steps forward.otherapproaches to improve Pythonmanymany approaches to improve Python, including recent work to speed up Python and replace the GIL, languages that look like Python but are subsets of it, and embedded DSLs that integrate with Python butthatare not first class languages. While we cannotdoan exhaustive list of all the efforts, we cantalkaboutsome of the challenges in theseareas,and why theyaren’tsuitable forModular’suse.significant energyhas been put into improving CPython performance and other implementation issues,andthisisshowinghugeresultsfor the community. This work is fantastic because it incrementally improves the current CPython implementation. Python 3.11 has delivered improvements of 10-60%fasterthanPython 3.10 through internal improvements, and Python 3.12 aims to go further with a trace optimizer. Many other projects are attempting to tame the GIL, and projects like PyPy (among many others) have used JIT compilation and tracing approaches to speed up Python.Theseare great efforts,butare not helpful in getting a unified language onto an accelerator. Many accelerators these days only support very limited dynamic features, or do so with terrible performance. Furthermore, systems programmersdon’tjust seek“performance”they also typically want a lot of“predictabilityandcontrol”over how a computation happens.afanof theseapproaches,and feel they are valuable and exciting to the community, they unfortunately do not satisfy our needs. We are looking to eliminate the need to use C or C++ within Python libraries,weseek the highestperformancepossible,andwecannot accept dynamic features at all in somecases,sothese approachesdon’thelp.“deployable”Python,oneexampleisTorchScript from the PyTorch project. These are usefulinthatthey often provide low-dependence deployment solutions and sometimes have high performance. Because they use Python-like syntax, they can be easier to learn than a novel language.don’tinteroperate with the Python ecosystem, do not have fantastic tooling (e.g. debuggers), and often changeoutinconvenient behavior inPythonunilaterally,which breaks compatibility and fragments the ecosystem. For example, many of these change the behavior of simple integers to wrap instead of producing Python-compatible math.challengeswith these approaches is that they attempt to solve a weak point of Python, butaren’tas good atPython’sstrong points. At best,thesecan provide a new alternative to C and C++–but without solving the dynamic use cases ofPythonthey cannot solve the“twoworldproblem”.This approach drives fragmentation, and incompatibility makes migration difficult to impossible - recall how challenging the Python 2 to Python 3 migration was.e.g.the @tf.function decorator in TensorFlow, the @triton.jit inOpenAI’sTriton programming model, etc. A major benefit of these systems is that they maintain compatibility with all of the Python ecosystem tooling, and integrate natively into Python logic, allowing an embedded mini language to co-exist with the strengths of Python for dynamic use cases.don’tintegrate well with debuggers and other workflow tooling, and do not support the level of native language integration that we seek for a language that unifies heterogeneous compute and is the primary way to write large scale kernels and systems. We hope to move the usability of the overall system forward by simplifying things and making it more consistent. Embedded DSLs are an expedient way to get demos up and running, but we are willing to put in the additional effort and work to provide better usability and predictability for ouruse-case.