Closed boegel closed 5 months ago
DeviceStats::requested_bytes
was introduced in libtorch 2.0. Are you linking against your own version of libtorch rather than the version that the dorado configuration process downloads?
@malton-ont Thanks for the feedback!
Yes, we are installing dorado
on top of a PyTorch 1.12.0 we installed ourselves here, since we prefer to have control over which version is used, and how it gets built (and because we try hard to avoid that stuff gets downloaded on the fly during an installation because that complicates reproducing that same installation later).
Is there an overview of which PyTorch versions dorado
0.3.4 is compatible with?
Hi @boegel - dorado
depends on a custom build of pytorch 2.0 that we host on our CDN because we need static libraries. This custom build gets downloaded when the dorado
build setup runs.
We still support building from the PyTorch hosted package, but it needs to be enabled with -D TRY_USING_STATIC_TORCH_LIB=0
when setting up cmake. The exact supported PyTorch version is specified here.
If your aim is to have reproducible builds, I would recommend using one of the pre-built dorado
releases since all the dependencies (other than standard host libs) are packaged together. So that build is fixed and will be reproducible.
Is there a reason you're doing custom builds?
@tijyojwad The main reason we're doing custom builds of Dorado and its dependencies is performance: we use compiler options like -march=native
so that the binaries obtained are optimized for the CPUs on which they will be used, which can result in significant performance improvements.
In addition, especially for PyTorch, the EasyBuild community does a significant effort to try and get the (massive) PyTorch test suite to pass on our custom PyTorch installation, so we're very reluctant to use a different PyTorch.
Thanks for the pointers on the requirement for -DTRY_USING_STATIC_TORCH_LIB=0
, that's very helpful.
Can you elaborate why you prefer using static libraries? Does that just make things easier w.r.t. packaging for Dorado?
Yes static libraries are primarily for reducing the size of the distributed build, since we minimize which torch libraries are packaged. It also reduces the likelihood of it interfering with existing torch installation. And for a given pre-built version of Dorado, all dependencies are fixed (i.e. we don't download any on the fly).
From a users perspective, I think dorado dependencies should be treated as a black box (whether it's depending on torch or not is a dorado implementation detail). Going down the path of making dorado use your own version of torch would be a non-trivial undertaking, and I think it might be better to depend on specific dorado versions (and have a process to validate upgrades) rather than lock down dorado's dependencies.
I'm trying to build dorado 0.3.4 from source using GCC 11.3.0 + CUDA 11.7.0, and I'm hitting the following compilation error:
Is this a known problem, should I use a different CUDA (or GCC) version, or am I overlooking something else?