Usage of C10 API's in WholeMemory and its implication on build

teju85 commented 2 years ago

@BradReesWork JFYI... Currently, WM directly uses C10 API's from pytorch. Eg: wholegraph/torch/whole_nccl_pytorch_tensor.h. This means, similar to the pytorch backend of cugraph-ops, we'll also have to think about the topic of CXX_ABI during its build! This can especially become complicated when WM moves inside cuGraph.

MatthiasKohl commented 2 years ago

I think it would make sense to go the same route as with cugraph-ops: if we expose things using cython, and only use the __cuda_array_interface__ for integrating with DL frameworks, then we're able to circumvent this issue entirely, because we won't interact with pytorch at a C++ level at all and so the ABI level won't matter. The issue only arises if you link against torchlib or some other pytorch backend directly, which you have to do of course if you want to use c10 or other C++ backends from pytorch directly.

dongxuy04 commented 1 year ago

I think this issue should have been addressed in PR #24 .

All WholeMemory are supporting both dlpack and cuda_array_interface by PyWholeMemoryFlattenDlpack
All memory needed by ops are allocated by wholememory_env_func_t, which is binded with PyTorch at python level using callbacks to Python functions.

So there is no C10 API used during build. And it can build and run without C10 API.

Moreover, as C10 API may have slightly better performance than python callback bindings. The codes are kept in python/pylibwholegraph/pylibwholegraph/torch_cpp_ext directory. They are not built at build and packaging time, just packaging the source code. If it is really helpful on performance, user can complie them at runtime on their machine manually as an optimizing option, by calling compile_cpp_extension. compile_cpp_extension is the only usage of C10 API, it is up to user and run only on user's machine with known PyTorch version. If users don't care about that, all the files under torch_cpp_ext are not used.

@BradReesWork I think we can we can close this issue now.

rapidsai / wholegraph

Usage of C10 API's in WholeMemory and its implication on build #6