tracel-ai / burn

Burn is a new comprehensive dynamic Deep Learning Framework built using Rust with extreme flexibility, compute efficiency and portability as its primary goals.
https://burn.dev
Apache License 2.0
8.47k stars 417 forks source link

Update tch to 0.16+ #1765

Open syl20bnr opened 4 months ago

syl20bnr commented 4 months ago

Update tch once the upstream fix is released in Pytorch 2.3.1 and tch is updated.

See compilation bug issue: https://github.com/LaurentMazare/tch-rs/issues/870

antimora commented 2 months ago

The upstream issue in PyTorch seems to be fixed (https://github.com/pytorch/pytorch/issues/124009).

antimora commented 2 months ago

@syl20bnr can we try updating it again and see if this is fixed? We should try it before the upcoming release.

syl20bnr commented 2 months ago

Yep, I'll look into it next week,

syl20bnr commented 1 month ago

Looking at it while I am refactoring our CI.

syl20bnr commented 1 month ago

The issue is still happening but with another DLL:

INTEL MKL ERROR: The specified module could not be found. mkl_vml_def.1.dll. Intel MKL FATAL ERROR: cannot load mkl_vml_def.1.dll.

oleid commented 1 month ago

It would appear tch 0.17 is now available. It features libtorch-2.4

FWIW: This is probably not related, I'm trying to get burn working using libtorch on my Radeon GPU. pyTorch-2.3 and 2.4 work fine, yet burn appers not to work when changing tch from 0.15 to 0.17.

Running benches/custom_gelu.rs (target/benchmarks/release/deps/custom_gelu-82b2276b553d5723)
thread 'main' panicked at /home/oleid/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tch-0.17.0/src/wrappers/tensor_generated.rs:8361:40:
called `Result::unwrap()` on an `Err` value: Torch("Could not run 'aten::empty.memory_format' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty.memory_format' is only available for these backends: [CPU, Meta, QuantizedCPU, QuantizedMeta, MkldnnCPU, SparseCPU, SparseMeta, SparseCsrCPU, SparseCsrMeta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].\n
[...]