nixified-ai / flake

A Nix flake for many AI projects
GNU Affero General Public License v3.0
624 stars 69 forks source link

invokeai terminated by signal SIGSEGV #49

Closed muni-corn closed 8 months ago

muni-corn commented 8 months ago

hi! i'm trying to get invokeai running on my setup but i'm running into an address boundary error.

2023-10-11 07:49:52.269848: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
/nix/store/6nyknk2dj5kxial6ymksbpgqhcmw2x7c-python3.10-pytorch-lightning-1.9.0/lib/python3.10/site-packages/pytorch_lightning/utilities/distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
  rank_zero_deprecation(
* Initializing, be patient...
>> Initialization file /home/muni/invokeai/invokeai.init found. Loading...
>> Internet connectivity is True
>> InvokeAI, version 2.3.1.post2
>> InvokeAI runtime directory is "/home/muni/invokeai"
>> GFPGAN Initialized
>> CodeFormer Initialized
>> ESRGAN Initialized
>> Using device_type cuda
>> xformers not installed
>> Initializing NSFW checker
fish: Job 1, 'nix run github:nixified-ai/flak…' terminated by signal SIGSEGV (Address boundary error)

i have an AMD RX 7600 GPU (gfx1102). let me know what other information i can provide to help!

muni-corn commented 8 months ago

sorry; this may be unrelated to nix. i was able to get a backtrace and it seems related to AMD and HIP:

#0  0x00007fffa5417085 in ?? () from /nix/store/n3p8ykaypcx80g1c8pifd9jfvbkr1hbz-python3.10-torch-1.13.1/lib/python3.10/site-packages/torch/lib/libamdhip64.so
#1  0x00007fffa541ae47 in ?? () from /nix/store/n3p8ykaypcx80g1c8pifd9jfvbkr1hbz-python3.10-torch-1.13.1/lib/python3.10/site-packages/torch/lib/libamdhip64.so
#2  0x00007fffa5427ce9 in ?? () from /nix/store/n3p8ykaypcx80g1c8pifd9jfvbkr1hbz-python3.10-torch-1.13.1/lib/python3.10/site-packages/torch/lib/libamdhip64.so
#3  0x00007fffa53a6009 in ?? () from /nix/store/n3p8ykaypcx80g1c8pifd9jfvbkr1hbz-python3.10-torch-1.13.1/lib/python3.10/site-packages/torch/lib/libamdhip64.so
#4  0x00007fffa53a61a0 in ?? () from /nix/store/n3p8ykaypcx80g1c8pifd9jfvbkr1hbz-python3.10-torch-1.13.1/lib/python3.10/site-packages/torch/lib/libamdhip64.so
#5  0x00007fffa528c6fe in ?? () from /nix/store/n3p8ykaypcx80g1c8pifd9jfvbkr1hbz-python3.10-torch-1.13.1/lib/python3.10/site-packages/torch/lib/libamdhip64.so
#6  0x00007fffa5313941 in hipMemcpyWithStream () from /nix/store/n3p8ykaypcx80g1c8pifd9jfvbkr1hbz-python3.10-torch-1.13.1/lib/python3.10/site-packages/torch/lib/libamdhip64.so
#7  0x00007fffa6f632ba in c10::hip::memcpy_and_sync(void*, void*, long, hipMemcpyKind, ihipStream_t*) () from /nix/store/n3p8ykaypcx80g1c8pifd9jfvbkr1hbz-python3.10-torch-1.13.1/lib/python3.10/site-packages/torch/lib/libtorch_hip.so
#8  0x00007fffa6f4f749 in at::native::copy_kernel_cuda(at::TensorIterator&, bool) () from /nix/store/n3p8ykaypcx80g1c8pifd9jfvbkr1hbz-python3.10-torch-1.13.1/lib/python3.10/site-packages/torch/lib/libtorch_hip.so
#9  0x00007fffcddcd6ea in at::native::copy_impl(at::Tensor&, at::Tensor const&, bool) () from /nix/store/n3p8ykaypcx80g1c8pifd9jfvbkr1hbz-python3.10-torch-1.13.1/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#10 0x00007fffcddcea61 in at::native::copy_(at::Tensor&, at::Tensor const&, bool) () from /nix/store/n3p8ykaypcx80g1c8pifd9jfvbkr1hbz-python3.10-torch-1.13.1/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#11 0x00007fffce8f7896 in at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) () from /nix/store/n3p8ykaypcx80g1c8pifd9jfvbkr1hbz-python3.10-torch-1.13.1/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#12 0x00007fffce0b5919 in at::native::_to_copy(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) ()
   from /nix/store/n3p8ykaypcx80g1c8pifd9jfvbkr1hbz-python3.10-torch-1.13.1/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#13 0x00007fffcec0497a in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper___to_copy>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) () from /nix/store/n3p8ykaypcx80g1c8pifd9jfvbkr1hbz-python3.10-torch-1.13.1/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#14 0x00007fffce50572d in at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) () from /nix/store/n3p8ykaypcx80g1c8pifd9jfvbkr1hbz-python3.10-torch-1.13.1/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
MatthewCroughan commented 8 months ago

Yeah the AMD stuff is very buggy, and there's nothing we can do about that, since we only build that code, we do not write or patch that code. If there are any patches we can apply, let me know, and we can do that.