Closed stellaraccident closed 1 year ago
Fantastic! Looking forward to the collaboration.
Will this sit directly under https://github.com/openxla
as in openxla/openxla-nvgpu
?
Will this sit directly under
https://github.com/openxla
as inopenxla/openxla-nvgpu
?
Yes, that is what I'm proposing.
cc @nluehr
Does that translate to “write access”?
Does that translate to “write access”?
Yes (which we already do on these repos for other contributors), but I tried to phrase it as more general. Being a "component maintainer" on OpenXLA parlance grants some other privileges in terms of overall openxla direction/management.
What will be the C++ namespace for all the new code in this project? ::xla::...
?
What will be the C++ namespace for all the new code in this project?
::xla::...
?
Not sure I would speak to "all of the new code", but historically, dialects and components of IREE take a namespace like ::mlir::iree_compiler::IREE::<COMPONENT>
, which nests well by letting everything resolve components by IREE::
. For the record, I don't love that it is all rooted under mlir
but that has been that way for a long time and could be cleaned up.
If not extending that namespacing scheme, I'd encourage at least something similar: ::openxla_compiler::XLA
. I've deviated from this a couple of times in the past and always regretted it and come back to normalize.
Proposed Directory Structure
openxla-nvgpu
will have a similar directory layout to IREE, upon which it depends:compiler/ src/ openxla_nvgpu/ Dialects/ runtime/ src/ openxla_nvgpu/ build_tools/
I'm curious why the sub-directory names are repeating "openxla"? That is why compiler/src/openxla_nvgpu
instead of compiler/src/nvgpu
?
Even further, if the repository itself is all about nvgpu
, why repeating it at all? What else will be inside compiler/src
in this repo?
It is just following the IREE convention of the include directory being rooted at the src/
and wanting to arrive at fully qualified include paths (i.e. `#include "openxla_nvgpu/...").
I've more or less resigned myself to the fact that the least bad thing is to repeat yourself exactly once in service of having globally unique include paths.
I'm not a big fan of mlir
top level namespace, and iree_compiler
as nvgu really just a project on top of IREE compiler, and I don't like openxla_compiler
namespace, and I don't have any suggestions :)
What if I want to write custom VM module under runtime/src/openxla_nvgpu
? In IREE it's mostly under iree
and iree::vm
namespace, should it be ::openxla
or regular ::xla
here?
Oh I missed the point about the include directory convention, makes sense! (I believe this convention is in place in mlir-hlo as well, I remember a rationale doc for the path structure, I think @burmako originated it, I don't know if it is public)
I'm not a big fan of
mlir
top level namespace, andiree_compiler
as nvgu really just a project on top of IREE compiler, and I don't likeopenxla_compiler
namespace, and I don't have any suggestions :)What if I want to write custom VM module under
runtime/src/openxla_nvgpu
? In IREE it's mostly underiree
andiree::vm
namespace, should it be::openxla
or regular::xla
here?
Just throwing things out there... ::openxla::nvgpu
? We're going to regret whatever we choose. Might as well at least not start with a namespace that is already used.
I'd go with ::openxla
, ::openxla::compiler
, ::openxla::runtime
, etc...? And not mention nvgpu at all. Do we see a single project depending on multiple "openxla compilers", e.g. openxla-nvgpu
and openxla-amdgpu
? Or link into single binary?
I'm game to try it. Like I said, I'm pretty sure that what we pick, we come back in a few months with some code written and apply a bit of sed, but I do think we want globally unique: there are many use cases where these will be linked together for both recommended and unrecommended reasons. Let's not set ourselves up for accidental name collisions.
Also, with C++17, the cost of namespaces (in terms of keystrokes) is a lot lower.
I would be concerned with plugins and target specific components redefining symbols in shared namespace: I'm not sure what the benefits of eliding the target from the namespace buys us in practice?
(nesting compilers under ::mlir
(as in ::mlir::mhlo::
for example) is nice in that you have an implicit using namespace mlir;
(maybe just a workaround for the google style banning using namespace
?), but we're still heavily using a specific sub-namespace to compartimentalize).
maybe just a workaround for the google style banning
using namespace
(ftr - we're not exporting such legacy rules to our new OSS projects)
Another directory naming question, why compiler/src/openxla_nvgpu
and not compiler/src/openxla/nvgpu
?
Will we also have some kind of openxla-base
(util
, ...) for libraries shared between different "backends", or the plan to keep shared code in iree repo?
Will we depend on absl/tsl? E.g. logging, Status, StatusOr inside nvgpu compiler/runtime? Or in compiler use LLVM logging (almost non existent), and in runtime use IREE logging (no idea what's the status).
What will be the license? Copyright 2023 The OpenXLA Authors
... Licensed under the Apache License v2.0 with LLVM Exceptions.
Will we also have some kind of
openxla-base
(util
, ...) for libraries shared between different "backends", or the plan to keep shared code in iree repo?
That is being discussed, but we are biasing towards getting moving at the moment over busting out dependencies. We have some work to do to get the dev/versioning workflow going for the level of things we have, and I'd rather get some more mileage on that before we go too crazy with a lot of repositories.
Will we depend on absl/tsl? E.g. logging, Status, StatusOr inside nvgpu compiler/runtime? Or in compiler use LLVM logging (almost non existent), and in runtime use IREE logging (no idea what's the status).
The further "core-ward" we go, we have no plan to depend on absl/tsl, and I would be somewhat resistant to doing so because they have both proven to be problematic (so much so that we excised after thinking "how bad could it be?").
Concrete thoughts...
I don't think that we should be mixing universes in the compiler code and need to "build up" from LLVM vs grafting other base libraries.
The runtime code for nvgpu has some more give to it from a dependency standpoint, but for the level of things expected to be in there, I would like to avoid the complexity that comes from taking complicated deps if possible.
Some of this stuff is preference and some has proven to be more trouble than it is worth in the past... The hard line that we can't cross in this repo is that dependencies must be thin and must have a well supported CMake build.
Another directory naming question, why
compiler/src/openxla_nvgpu
and notcompiler/src/openxla/nvgpu
?
I don't have a preference.
Newer waste a good opportunity to bike-shed :smile:
Logging, as brought up by @ezhulenev, caught my eye. For the compiler, I would suggest we try to use the diagnostic handler infrastructure as much as possible and log warnings/errors there. That will force us to provide messages with good context.
For "trace what this does" debugging use cases, I agree that LLVM-DEBUG
is not great but will work for now. We will need to replace this later with something that has severity and other bells and whistles but that can be done. I am used to the VLOG
interface and we could have a shim that maps to LLVM-DEBUG
if we decide to care already. Maybe IREE wants to provide this akin to what tsl
does, so that different integrations can swap in what fits their needs. Depending on tsl
just for that seems a bit heavy.
Regarding namespaces: I agree with @stellaraccident: Lets bias on progress rather than perfect choice for now. Having said that, I personally would use openxla::compiler::nvgpu
for the CUDA compiler. We will have more compilers sooner than later and having them in different namespaces helps my mental model of where code belongs. Might be xla bias and I have no strong opinion. Just some :bike: :hut: :paintbrush:
Excited to see this project spin up!
This has been a heavy overhead week for me but I should finally get some coding time this afternoon, and since I can probably bootstrap the project somewhat efficiently, I'll take a stab at that. As noted, I'll stage it in an iree-samples directory first and will then hand off to someone Google-side to create the repo (which requires a bit of red tape).
All right... the above two commits seem to get me most of the way there. Things build, etc. Was a bit of a slog.
And initial CuDNN custom module that does nothing related to CuDNN yet: https://github.com/iree-org/iree-samples/pull/123/files
Both of the initial commits have landed.
@theadactyl Can we get someone with admin rights on the org to create the openxla-nvgpu repo? It should be populated with https://github.com/iree-org/iree-samples/tree/main/openxla-nvgpu
Coming in late and I think just agreeing with what's already been decided, but strong positions on: not taking an absl or tsl dep in the compiler and scoping the [sub]namespace to the project (so not just "iree" or "openxla" or "compiler")
I do think that the Google (Titus) advice on not having deeply nested namespaces is pretty good and not just a weird Google thing: https://abseil.io/tips/130. Every level of nesting gives us a fun new place for collisions. So I would vote for something like openxla_nvgpu
as a top-level namespace. Probably nesting it under mlir
would be fine, since we shouldn't define things that conflict with MLIR anyway. I think there was at some point a limitation of ODS that meant you had to define things in the mlir
namespace, but I also suspect that this is a relic of the Google C++ style guide ban on using namespace mlir
, which while not completely misguided IMO, seems excessively strict. using namespace mlir
seems better than sneakily getting that by just defining everything nested under that namespace. I guess the difference is that you can only do the latter for one namespace. I think we can probably limit ourselves to using namespace mlir
and using namespace llvm
and then not have to nest our namespaces, unless there's some other reason to do so.
I don't like openxla_nvgpu
namespace, because presumably we'll have some kind of openxla-base
shared by all openxla compilers, and it's nice to share a namespace to skip qualifying all imports.
+HUGE to using namespace
but only for small number of things (mlir
and llvm
)
Now that RFC - Proposal to Build IREE Compiler Plugin Mechanism has been implemented and is minimally working with both in-tree and an early adopter out of tree implementation, we are ready to propose the official creation of a new
openxla-nvgpu
project to house:Since this will be the first plugin project of note, we expect that to a certain extent, it will co-evolve with the mechanisms.
Proposed Directory Structure
openxla-nvgpu
will have a similar directory layout to IREE, upon which it depends:The build setup will use
bazel_to_cmake
for consistency and interop with the upstream IREE project (now that https://github.com/openxla/iree/pull/12765 has taken the steps to allow it to be used out of tree). The corresponding IREE build macros will be extended as needed.Dependencies
This project depends directly on:
It transitively depends on:
Releases
In the very short term, this project will not produce independent releases and will aim to be usable by project developers who are willing to build it from HEAD.
In the longer term, we would like to introduce a new top-level
openxla-compiler
releaser project which is responsible for packaging the platform (IREE) with stable versions of all supported vendor compiler plugins and making available as a standalone binary release. Such a project would eventually depend on this one and would effectively be the plugin-aggregated release of the present dayiree-compiler
packages (which will continue to be released as the "vanilla" platform without out-of-kind platform specific dependencies).Also in the longer term, as the PJRT plugin support evolves, we anticipate releasing
openxla-nvgpu-pjrt
binary packages that can be used to interface NVIDIA GPUs to supported ML frameworks viapip install
.Versioning and CI
Adding this top-level project pushes us firmly into a "many-repo" layout for OpenXLA projects. This will be further reinforced as IREE's dependencies and build tools are disaggregated over time and the top-level releaser projects are established.
As part of this, we will introduce a side by side workspace layout where dependencies are found relative to each other based on parent directory. Example:
Such a layout will be called an "OpenXLA workspace" and we will provide a
sync
script and CI tooling to help manage it. Each project will pin to green release tags in its parent (or other form of stable commit tracking) by maintaining a local metadata file of its openxla dependencies. Thesync
script will be both a simple way for developers to track known-good sync points for the workspace and for CIs to advance. There will be a CI bot which advances dependent projects to next stable sync points automatically. We expect that for projects that already have a strong release cadence like IREE, this will update pins to new nightly releases, and others will cascade from at-head commits.This process of versioning will be developed over time with an eye towards being the one way to manage openxla project dependencies. It will likely be somewhat manual to start with. This will be derived in spirit from the PJRT plugin sync script and enhanced to provide better release tracking and version bump capabilities.
Benchmarking
Benchmarks of the NVIDIA toolchain will be largely inherited and leveraged from the IREE project but run independently so as to provide a continuous view of the performance deltas and characteristics of the platform-independent upstream and the vendor-specific downstream.
Next steps
As a very next step, the
openxla-nvgpu
project will be bootsrapped in the iree-samples repository. It will be relocated, with history, to a new git repository once this RFC has matriculated.Project Ownership
The project will be set up as a collaboration between Google and NVIDIA, and per OpenXLA governance, will share maintainer responsibility between contributors from both companies with the goal of NVIDIA engineers taking on core maintainer responsibility as the project bootstraps and evolves.