Open jamiemoller opened 5 months ago
ps. love the work @mudler
it should be noted
1 - the documentation for rocm for some reason indicates make BUILD_TYPE=hipblas GPU_TARGETS=gfx1030
... there is no build
arg
2 - stablediffusion is the hardest thing to get working in any environment ive tested. as in i have yet to actually get it to build on arch, deb, or opensuse
3 - the following dockerfile is the smoothest ive had it build so far
FROM archlinux
# Install deps
# ncnn not required as stablediffusion build is broken
RUN pacman -Syu --noconfirm
RUN pacman -S --noconfirm base-devel git rocm-hip-sdk rocm-opencl-sdk opencv clblast grpc go ffmpeg ncnn
# Configure Lib links
ENV CGO_CFLAGS="-I/usr/include/opencv4" \
CGO_CXXFLAGS="-I/usr/include/opencv4" \
CGO_LDFLAGS="-L/opt/rocm/hip/lib -lamdhip64 -L/opt/rocm/lib -lOpenCL -L/usr/lib -lclblast -lrocblas -lhipblas -lrocrand -lomp -O3 --rtlib=compiler-rt -unwindlib=libgcc -lhipblas -lrocblas --hip-link"
# Configure Build settings
ARG BUILD_TYPE="hipblas"
ARG GPU_TARGETS="gfx906" # selected for RadeonVII
ARG GO_TAGS="tts" # stablediffusion is broken
# Build
RUN git clone https://github.com/go-skynet/LocalAI
WORKDIR /LocalAI
RUN make BUILD_TYPE=${BUILD_TYPE} GPU_TARGETS=${GPU_TARGETS} GO_TAGS=${GO_TAGS} build
# Clean up
RUN pacman -Scc --noconfirm
it should be noted that while i do see models load onto the card whenever there is an api call and there are computations being performed pushing the card to 200W of consumption there is never any return from the api call and the apparent inference never terminates
Presently it is very hard to get a docker container to build with the rocm backend, some elements seem to fail independently during the build process. There are other related projects with functional docker implementations that do work with rocm out of the box (aka llama.cpp). I would like to work on this myself however between the speed at which things change in this project and the amount of time I have free to work on this, I am left only to ask for this.
I don't have an AMD card to test, so this card is up-for-grabs.
Things are moving fast, right, but building-wise this is a good time window, there aren't plans to do changes in that code area in the short-term.
If there are good 'stable' methods for building a docker implementation with rocm underneath already it would be very appreciated if this could be better documented. 'arch' helps nobody that wants to run on a more enterprisy os like rhel or sles.
A good starting point would be in this section: https://github.com/mudler/LocalAI/blob/9c2d2649796907006568925d96916437f5845aac/Dockerfile#L159 we can pull RocM dependencies in there if the appropriate flag was passed by
@jamiemoller you could use https://github.com/wuxxin/aur-packages/blob/main/localai-git/PKGBUILD as a starting point, its a (feature limited) archlinux package of localai for CPU, CUDA and ROCM. There are binaries available via arch4edu. See https://github.com/mudler/LocalAI/issues/1437
Please do work on that. I'm trying to put any load on AMD GPU for week now. Building from source on Ubuntu for clBlast fails in so many ways it's not even funny.
i have a feeling that it will be better to start from here (or something) for amd builds now that 2.8 is on the ubu22.04
did some progress on https://github.com/mudler/LocalAI/pull/1595 (thanks to @fenfir to have started this up) but I don't have an AMD video card, however CI seems to pass and container images are being built just fine.
I will merge as soon as the v2.8.2 images are out - @jamiemoller @Expro could you give the images a shot as soon as they are on master?
Sure, I will take them for spin. Thanks for working on that.
hipblas images are pushed by now:
quay.io/go-skynet/local-ai:master-hipblas-ffmpeg-core
Unfortunately, not working as intended. GPU was detected, but nothing was offloaded:
4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr ggml_init_cublas: found 1 ROCm devices: 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr Device 0: AMD Radeon (TM) Pro VII, compute capability 9.0, VMM: no 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: loaded meta data with 20 key-value pairs and 325 tensors from /build/models/c0c3c83d0ec33ffe925657a56b06771b (version GGUF V3 (latest)) 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 0: general.architecture str = phi2 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 1: general.name str = Phi2 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 2: phi2.context_length u32 = 2048 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 3: phi2.embedding_length u32 = 2560 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 4: phi2.feed_forward_length u32 = 10240 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 5: phi2.block_count u32 = 32 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 6: phi2.attention.head_count u32 = 32 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 7: phi2.attention.head_count_kv u32 = 32 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 8: phi2.attention.layer_norm_epsilon f32 = 0.000010 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 9: phi2.rope.dimension_count u32 = 32 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 10: general.file_type u32 = 7 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 11: tokenizer.ggml.add_bos_token bool = false 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 12: tokenizer.ggml.model str = gpt2 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,51200] = ["!", "\"", "#", "$", "%", "&", "'", ... 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,51200] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,50000] = ["Ġ t", "Ġ a", "h e", "i n", "r e",... 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 50256 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 50256 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 50256 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 19: general.quantization_version u32 = 2 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - type f32: 195 tensors 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - type q8_0: 130 tensors 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_vocab: mismatch in special tokens definition ( 910/51200 vs 944/51200 ). 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: format = GGUF V3 (latest) 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: arch = phi2 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: vocab type = BPE 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_vocab = 51200 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_merges = 50000 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_ctx_train = 2048 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_embd = 2560 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_head = 32 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_head_kv = 32 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_layer = 32 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_rot = 32 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_embd_head_k = 80 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_embd_head_v = 80 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_gqa = 1 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_embd_k_gqa = 2560 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_embd_v_gqa = 2560 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: f_norm_eps = 1.0e-05 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: f_norm_rms_eps = 0.0e+00 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: f_clamp_kqv = 0.0e+00 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: f_max_alibi_bias = 0.0e+00 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_ff = 10240 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_expert = 0 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_expert_used = 0 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: rope scaling = linear 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: freq_base_train = 10000.0 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: freq_scale_train = 1 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_yarn_orig_ctx = 2048 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: rope_finetuned = unknown 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: model type = 3B 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: model ftype = Q8_0 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: model params = 2.78 B 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: model size = 2.75 GiB (8.51 BPW) 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: general.name = Phi2 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: BOS token = 50256 '<|endoftext|>' 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: EOS token = 50256 '<|endoftext|>' 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: UNK token = 50256 '<|endoftext|>' 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: LF token = 128 'Ä' 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_tensors: ggml ctx size = 0.12 MiB 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_tensors: offloading 0 repeating layers to GPU 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_tensors: offloaded 0/33 layers to GPU 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_tensors: ROCm_Host buffer size = 2819.28 MiB 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr ............................................................................................. 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_new_context_with_model: n_ctx = 512 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_new_context_with_model: freq_base = 10000.0 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_new_context_with_model: freq_scale = 1 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_kv_cache_init: ROCm_Host KV buffer size = 160.00 MiB 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_new_context_with_model: KV self size = 160.00 MiB, K (f16): 80.00 MiB, V (f16): 80.00 MiB 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_new_context_with_model: ROCm_Host input buffer size = 6.01 MiB 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_new_context_with_model: ROCm_Host compute buffer size = 115.50 MiB 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_new_context_with_model: graph splits (measure): 1 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr Available slots: 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr -> Slot 0 - max context: 512 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr all slots are idle and system prompt is empty, clear the KV cache 4:14PM INF [llama-cpp] Loads OK 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr slot 0 is processing [task id: 0] 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr slot 0 : kv cache rm - [0, end) 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr CUDA error: shared object initialization failed 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr current device: 0, in function ggml_cuda_op_mul_mat at /build/backend/cpp/llama/llama.cpp/ggml-cuda.cu:9462 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr hipGetLastError() 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr GGML_ASSERT: /build/backend/cpp/llama/llama.cpp/ggml-cuda.cu:241: !"CUDA error"
Tested with integrated phi-2 model with gpu_layers specified:
` name: phi-2 context_size: 2048 f16: true gpu_layers: 90 mmap: true trimsuffix:
usage: | To use this model, interact with the API (in another terminal) with curl for instance: curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "phi-2", "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}] }' `
the rocm docker image does appear to load the model however there is a grpc error that I have encountered that causes the call to terminate before inference, i am moving to 22.04 with rocm 6.0.0 on the host make sure there are no version compatibility issues.
Note: the new vulkan implementation of llama.cpp seems to work flawlessly
Im trying to work on the hipblas version but I am confused on where the Dockerfiles are located that are used to generate the latest images such as "quay.io/go-skynet/local-ai:master-hipblas" . One thing I noticed is that the latest hipblas images are still using rocm v6.0.0 while v6.0.3 is now out. But I have been unable to locate a Dockerfile in the git repo that is installing any version of rocm. So it would appear the Dockerfle being used is hosted elsewhere?
Would appreciate if someone could point me to the latest Dockerfile being used to generate the hipblas images. Thank you
Im trying to work on the hipblas version but I am confused on where the Dockerfiles are located that are used to generate the latest images such as "quay.io/go-skynet/local-ai:master-hipblas" . One thing I noticed is that the latest hipblas images are still using rocm v6.0.0 while v6.0.3 is now out. But I have been unable to locate a Dockerfile in the git repo that is installing any version of rocm. So it would appear the Dockerfle being used is hosted elsewhere?
Would appreciate if someone could point me to the latest Dockerfile being used to generate the hipblas images. Thank you
newer does not equal better, this said, x.x.Y versions of Y variation are usually hotfixes and usually only apply to some very specific edge cases, can you clarify any issues you may have with 6.0.0 that are resolved with 6.0.3?
the rocm docker image does appear to load the model however there is a grpc error that I have encountered that causes the call to terminate before inference, i am moving to 22.04 with rocm 6.0.0 on the host make sure there are no version compatibility issues.
Note: the new vulkan implementation of llama.cpp seems to work flawlessly
I think I just discovered the cause of my issue...
I am running my Radeon VII for this workload
this would be a gfx906
device
presently i find only GPU_TARGETS ?= gfx900,gfx90a,gfx1030,gfx1031,gfx1100
in the makefile
regarding this gfx900
is not supported for rocm v5.>> or v6.0.0
I have yet to test if a tailored build including gfx906 will work but this may be a good candidate for inclusion in the next hipblas build details
for reference currently under 6.0.0 the following llbm targets are supported gfx942,gfx90a,gfx908,gfx906,gfx1100,gfx1030 I would not for clarity that the gfx906 target is depreciated for the instinct MI50 but not for the radeon pro vii or the radeon vii, add to this that the instinct MI25 is the only gfx900 card and is noted as no longer supported, while I do think we should keep gfx900 in place for as long as possible it may impact future builds
I may not have time to test an amendment to the GPU_TARGETS
for the next few weeks (I only have like 2 hrs free today and after building my gpu into a single node k8s cluster I need to configure a local container registry before I can test any custom builds :( )
@fenfir might you be able to test this?
the rocm docker image does appear to load the model however there is a grpc error that I have encountered that causes the call to terminate before inference, i am moving to 22.04 with rocm 6.0.0 on the host make sure there are no version compatibility issues. Note: the new vulkan implementation of llama.cpp seems to work flawlessly
I think I just discovered the cause of my issue... I am running my Radeon VII for this workload this would be a
gfx906
device presently i find onlyGPU_TARGETS ?= gfx900,gfx90a,gfx1030,gfx1031,gfx1100
in the makefile regarding thisgfx900
is not supported for rocm v5.>> or v6.0.0I have yet to test if a tailored build including gfx906 will work but this may be a good candidate for inclusion in the next hipblas build details
for reference currently under 6.0.0 the following llbm targets are supported gfx942,gfx90a,gfx908,gfx906,gfx1100,gfx1030 I would not for clarity that the gfx906 target is depreciated for the instinct MI50 but not for the radeon pro vii or the radeon vii, add to this that the instinct MI25 is the only gfx900 card and is noted as no longer supported, while I do think we should keep gfx900 in place for as long as possible it may impact future builds
I may not have time to test an amendment to the
GPU_TARGETS
for the next few weeks (I only have like 2 hrs free today and after building my gpu into a single node k8s cluster I need to configure a local container registry before I can test any custom builds :( )@fenfir might you be able to test this?
ok so fyi
current master-hipblas-ffmpeg-core
image with GPU_TARGETS=gfx906
does not build
[ 0%] Building C object CMakeFiles/ggml.dir/ggml.c.o
[ 1%] Building C object CMakeFiles/ggml.dir/ggml-alloc.c.o
[ 1%] Building C object CMakeFiles/ggml.dir/ggml-backend.c.o
[ 2%] Building C object CMakeFiles/ggml.dir/ggml-quants.c.o
[ 2%] Building CXX object CMakeFiles/ggml.dir/ggml-cuda/acc.cu.o
clang++: error: invalid target ID 'gfx903'; format is a processor name followed by an optional colon-delimited list of features followed by an enable/disable sign (e.g., 'gfx908:sramecc+:xnack-')
gmake[4]: *** [CMakeFiles/ggml.dir/build.make:132: CMakeFiles/ggml.dir/ggml-cuda/acc.cu.o] Error 1
2024-04-07T15:31:29.842216496+10:00 gmake[4]: Leaving directory '/build/backend/cpp/llama/llama.cpp/build'
gmake[3]: *** [CMakeFiles/Makefile2:842: CMakeFiles/ggml.dir/all] Error 2
gmake[3]: Leaving directory '/build/backend/cpp/llama/llama.cpp/build'
2024-04-07T15:31:29.842808442+10:00 gmake[2]: *** [Makefile:146: all] Error 2
2024-04-07T15:31:29.842836792+10:00 gmake[2]: Leaving directory '/build/backend/cpp/llama/llama.cpp/build'
make[1]: *** [Makefile:75: grpc-server] Error 2
make[1]: Leaving directory '/build/backend/cpp/llama'
make: *** [Makefile:517: backend/cpp/llama/grpc-server] Error 2
EDIT: 'waaaaaaiiiiit a second' I think im retarded...
EDIT2: yep im definately retarded, setting the environment var GPU_TARGETS=gfx906
worked fine, not i just need to get my model and context right <3 @mudler @fenfir <3 can we pls get gfx906
added to the default targets pls
@Expro take a look at my previous posts, maybe they will help you solve this, ping me if you like, maybe I can help
@mudler before i spend the time, are there any immediate plans for expanded k8s docs or AMD specific docs?
@mudler before i spend the time, are there any immediate plans for expanded k8s docs or AMD specific docs?
Hey @jtwolfe , thanks for deep diving into this, I don't have an AMD card to test things out so I refrained to write documentation that I couldn't test with. Any help on that area is greatly appreciated.
@mudler before i spend the time, are there any immediate plans for expanded k8s docs or AMD specific docs?
Hey @jtwolfe , thanks for deep diving into this, I don't have an AMD card to test things out so I refrained to write documentation that I couldn't test with. Any help on that area is greatly appreciated.
ack. I'll do my best to try and get some of AMD brethren to test some more edge cases so we can give some more details on modern cards but I will send a PR up for docs when I get time.
Im trying to work on the hipblas version but I am confused on where the Dockerfiles are located that are used to generate the latest images such as "quay.io/go-skynet/local-ai:master-hipblas" . One thing I noticed is that the latest hipblas images are still using rocm v6.0.0 while v6.0.3 is now out. But I have been unable to locate a Dockerfile in the git repo that is installing any version of rocm. So it would appear the Dockerfle being used is hosted elsewhere? Would appreciate if someone could point me to the latest Dockerfile being used to generate the hipblas images. Thank you
newer does not equal better, this said, x.x.Y versions of Y variation are usually hotfixes and usually only apply to some very specific edge cases, can you clarify any issues you may have with 6.0.0 that are resolved with 6.0.3?
i hope you're using containers \winkyface
it appears that the AMD advice regarding 'dowards compatability" is correct ie. I am currently running 6.0.2 on my server and while the container works on 6.0.0 and have yet to have any issues
if you wish to keep your server driver up to date as long as the major version is the same between the host and the container and the host minor version is greater than the containers then you should not have any problems
eg (yes I know 6.1.0 does not exist) host | container | result 5.4.0 | 6.0.0 | fail 6.0.0 | 5.4.0 | fail 6.0.0 | 6.0.0 | success 6.1.0 | 6.0.1 | success 6.0.1 | 6.1.0 | fail
really there should not be an issue in either direction with minor version updates however there is the potential for baser operations to be invalidated accidentally via implementation in whatever program that makes the calls. this said I would still recommend keeping to the AMD standard
i would recommend for comparability sake that we keep the container rocm version at 6.0.0 until such time that there is a breaking change that stops this backwards compatibility
Im trying to work on the hipblas version but I am confused on where the Dockerfiles are located that are used to generate the latest images such as "quay.io/go-skynet/local-ai:master-hipblas" . One thing I noticed is that the latest hipblas images are still using rocm v6.0.0 while v6.0.3 is now out. But I have been unable to locate a Dockerfile in the git repo that is installing any version of rocm. So it would appear the Dockerfle being used is hosted elsewhere?
Would appreciate if someone could point me to the latest Dockerfile being used to generate the hipblas images. Thank you
@derzahla
i would not recommend building it from scratch.
grab the hipblas image and pass it the REBUILD=true
var
also if you have issues after the rebuild check the llvm target for your card and pass in GPU_TARGETS=gfx$WHATEVER
find the llvm target for your gpu https://llvm.org/docs/AMDGPUUsage.html#processors then check the comparability with rocm https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.0.0/reference/system-requirements.html
this should work, im lucky to have a card thats directly referenced on the rocm supported gpu list but I expect that any chip associated with the llvm target should work (ie gfx1030 includes rx6800, rx6800xt and rx6900xt but according to amd If a GPU is not listed on this table, It’s not officially supported by AMD.
)
newer does not equal better, this said, x.x.Y versions of Y variation are usually hotfixes and usually only apply to some very specific edge cases, can you clarify any issues you may have with 6.0.0 that are resolved with 6.0.3?
There still does not seem to be any release notes out for 6.0.3, but since I have a gfx1103 which isn't officially supported up through 6.0.2, I was hoping maybe it was added in 6.0.3.
However, I have had success with ollama by setting "HSA_OVERRIDE_GFX_VERSION=11.0.2" ( on rocm 6.0.2 & 6.0.3, at least)
I initially tried setting REBUILD=true and it didn't help. That's why I was trying to find the actual Dockerfile used to generate the hipblas registry containers. I can try running with REBUILD=true again and post details of the results
newer does not equal better, this said, x.x.Y versions of Y variation are usually hotfixes and usually only apply to some very specific edge cases, can you clarify any issues you may have with 6.0.0 that are resolved with 6.0.3?
There still does not seem to be any release notes out for 6.0.3, but since I have a gfx1103 which isn't officially supported up through 6.0.2, I was hoping maybe it was added in 6.0.3.
However, I have had success with ollama by setting "HSA_OVERRIDE_GFX_VERSION=11.0.2" ( on rocm 6.0.2 & 6.0.3, at least)
I initially tried setting REBUILD=true and it didn't help. That's why I was trying to find the actual Dockerfile used to generate the hipblas registry containers. I can try running with REBUILD=true again and post details of the results
hmmmm
https://www.reddit.com/r/ROCm/comments/1b36sjj/support_for_gfx1103/ there is a note here indicating that maybe if compiled for gfx1100 there may be a path but from what i see the gfx1103 is an integrated graphics solution/mGPU (is that the case for you?).
if it is im inclined to think that this may be a harder problem than you'd like. as I understand it there are architectural changes regarding memory management for AMD APU that may proclude if from being easily compilable with rocm.
have you had a look at vllm with rocm?
https://docs.vllm.ai/en/latest/getting_started/amd-installation.html
you may have some success with a single inference tool? (beware I have had it eat >70GB of memory during the docker build
for the rocm supporting image)
personally i would love to see a implementation of localai with vulkan however this is all dependent on upstream project support. and for this i expect that there may be a considerable amount of 'hackery' and 'overhead' related losses that may make this a considerable time sink for developers :(
PS. if this is a mobile gpu I would ask what the cost/benefit for this looks like? while it would be good for people without access to performant machines I expect a better solution would be to find an eGPU chassis on ebay and fill it with a cheap rx6600/rx7600 or the like.
PPS. I have used LMStudio on my Legion GO with its Z1 and while it did work 'sometimes' (memory allocation I think) i did not get any better performance than doing straight CPU inference on of my 7950X systems (~12+-5 tokens/s)
@jamiemoller Interestingly the LLM function seems to work if I recompile for gfx1100 as you mentioned and change HSA_OVERRIDE_GFX_VERSION to 11.0.0. I wonder if gfx1102 and HSA_OVERRIDE_GFX_VERSION=11.0.2" would work with an upgraded rocm to >= 6.0.2.
Yes my gfx1103 is an iGPU but its not mobile. I have an Radeon 8600G in an ATX case, so I can upgrade to a more powerful GPU easy enough but I wanted to push the limits of this iGPU first and see if it would be sufficient.
I have not tried vLLM but thanks for making me aware of it. ollama works very nicely for LLM functionality. One of things I was looking forward to with localai is AI Art integrations with stable diffusion and tinydream. Stablediffusion still pukes on the rebuilt container with:
7:46PM DBG GRPC(stablediffusion_assets-127.0.0.1:35289): stderr /tmp/localai/backend_data/backend-assets/grpc/stablediffusion: error while loading shared libraries: libomp.so: cannot open shared object file: No such file or directory
So again, it would be nice if someone could point me to the Dockerfile's used to build the hipblas images so I could modify them for my needs.
@derzahla
last question first:
github actions
workflows
in the repo
im more of a gitlab ci guy myself but it looks pretty simple, just check out
https://github.com/mudler/LocalAI/tree/master/.github/workflows
if you take a look at image.yml
image_build.yml
image-pr.yml
and release.yaml
you will find the all the details regarding overrides for the build processgood to know that there is a workaround for "hotfix'ed" target versions, strange tho that youre having the sd issue I'm currently looking into imagegen myself but havent had any luck so far from memory most image gen implementations use rocm 5.x and use a custom version of some python library i cant remember the name of that emulates cuda enablement (pytorch)
im working my way though the feature list now to test for docs
so far ive tested working textgen (gpu) tts (gpu) \ i think piper hit lik 5% of my gpu for about 2.5s to generate the first 20% of the turbo encabulator talk sst (cpu) / whisper is fast on anything vision (gpu)
embeddings - was doing something funny because transformers
diffusion - \shrug - still investigating
edit; for some reason diffusers-rocm.yml does not note the --extra-index-url as per pytorch docs
https://pytorch.org/get-started/locally/
unsure if this has any impact as /rocm6.0/*
forwards to /*
in the same index url
edit2: i have found and replicated your limomp.so issue I'm having a hard time whats calling it tho also no the easy 'just install the library' solution doesn't seem to work atm i think there another dependency somewhere thats expecting it as a prereq
2024-04-13T15:55:25.964632811+10:00 5:55AM DBG GRPC(stablediffusion_assets-127.0.0.1:41555): stderr /tmp/localai/backend_data/backend-assets/grpc/stablediffusion: error while loading shared libraries: libomp.so: cannot open shared object file: No such file or directory
edit3; so it appears that the libomp.so library issue only occurs on SD in cpu mode (ai the aio/cpu/image-gen.yaml) on using the aio/gpu-8g/image-gen.yaml there appears another error which results in a connection error from grpc
7:13AM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:40605): stderr /build/backend/python/diffusers/run.sh: line 13: activate: No such file or directory
7:13AM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:40605): stderr /build/backend/python/diffusers/run.sh: line 19: python: command not found
this specifically refers to
ln 13 source activate diffusers
ln 19 python $DIR/backend_diffusers.py $@
i have found that /opt/conda is straight up not availible
bingo
so it looks like theres some python stuff missing then, so what next?
I have switched to the non-core image since if memory serves core
removes some python related things to slim down the image.
now my problem is downloading a 20gb image at 'quay speed'
edit4: ... yep
image so large i have to move my models to another disk :|
@derzahla I think there might be some reduced featureset for igpu (my bets on memory adjacent) that is a bit of a sticking point atm in drivers. news was that rocm5.7? was dropping support of a bunch of cards soon so im really not sure how much compatibility were going to get with older chips with non-"ai-specific-architecture".
cheeky soln - if you can make vgpu work on your host system just find some good tools and run the independently, automatic1111 and oobagooba come to mind ;) split the api with a proxy
@derzahla i apologize but I was incorrect there is still an issue with SD i see in my testing
it appears that as I was testing the aio models i did not realise that the cpu and gpu examples actually use different backends. the functional gpu model makes use of diffusers
meanwhile the cpu mpdel makes use of stablediffusion
, presently I trust the diffusers
backend more than the stablediffusion
one as it seems that sd
is just a prebuilt repo and executes it entirely separately to the diffusers
backend, as such the bug is probably in the upstream repo from @EdVince
I am inclined to ask @mudler if he is aware of any reasons why this may not be working? (also if youre listening @mudler i seem to recall my testing around v2.0 on cpu would jettison unused models if there was not enough memory, then complete loading the model, this is not working for gpu atm :| any ideas? like @derzahla noted about SD could it possibly be the rebuild)
But either way the gpu accelerated model using the diffusers
backend seems to be working without issue
its also worth noting that the intel
solution has a different configuration again so im unsure if that will work either
I swear my headstone will read still testing
Hi, I have a Radeon VII and was able to get it working on localai. I did have to make some tweaks to get it to build and use gfx906 however...
# docker-compose.yaml
image: quay.io/go-skynet/local-ai:v2.12.4-aio-gpu-hipblas
environment:
- DEBUG=true
- REBUILD=true
- BUILD_TYPE=hipblas
- GPU_TARGETS=gfx906
devices:
- /dev/dri
- /dev/kfd
Cheers
@bunder2015 when you say 'it' do you mean the container or the 'stablediffusion' backend. also would you mind listing any of the aio defined models and if they offload to gpu? any details you can confirm with testing would be appreciated.
also also; i have had issues with using the 'cloned-voice' backend. it is currently giving me an error due to a missing opencl library this is in the same fashion s the missing as the libopm.so issue for sd.
any detail would be appreciated
also fyi i am using the GO_TAGS="stablediffusion tinydream tts" and DEBUG="true" for my rebuild of the 'non-core' and 'non-aio' 'latest' master image
Sorry for the confusion, I meant that I was able to get the localai container with gpu offloading to work.
I tried the following models: bakllava gpt-4 hermes-2-pro-mistral llava-1.6-mistral mixtral-instruct phi-2 stablediffusion tts (gpt-4 seems to also be an alias to hermes-2-pro-mistral)
To my knowledge, they all offloaded to the gpu. I had issues getting them to offload at first (some error about not being able to find tensiles?)... but I tried ollama's docker container and noticed it had the same devices
setup, and offload was working there... so I tried it here and offload started working here as well.
It appears I also set GO_TAGS="stablediffusion tts"
in .env
... I think I had issues adding tinydream there, although Dockerfile
has all three set. :shrug:
I hope that helps some, let me know if you need more... Cheers
edit: I tried bark tts and unfortunately it's not offloaded... piper seems to be, but it doesn't support japanese unfortunately.
@bunder2015 thanks for the details
I've been adding the f16 and ngpu flags to things to test for 'easy' gpu use and its been kinda hit and miss. eg. vall-e-x for some reason will recognize the ngpu flag but not the fp16 flag and when i try to use the clone process i get some rhylean audio and no gpu offloading, /shrug
perhaps since its a python tool it needs a cuda flag too? maybe??
i do seem to be able to at least run the clone tool now thanks to the change to the versioned image rather than master (like you im using v2.12.4 (still not using the aio image tho) ) still no luck tho with the video gen however, everything loads to the gpu correctly however it still wants an opencv python package
7:04AM DBG GRPC(damo-vilab/text-to-video-ms-1.7b-127.0.0.1:41305): stderr export_to_video requires the OpenCV library but it was not found in your environment. You can install it with pip: `pip
for some reason also ive had a weird issue with the --output flag for curl where for some reason the generated audio files aren't being exported back full and i just get a 504 error output to a .wav file (knowing my luck ill figure that one out in another day)
regarding cloning i have a feeling that it will be easier to just train my own .onyx for piper but we'll see
Hi, ignore my (previously deleted) message about not being able to build 2.16.0, I was able to get it to build by removing GO_TAGS from my docker-compose file. But I'm still having issues, now with diffusers/dreamshaper not working, it says I don't have the nvidia drivers loaded... log file
So I added back GO_TAGS, but removed the quotes, because I started seeing stuff like GO_TAGS=""stablediffusion tinydream tts""
, which if I'm not mistaken, would evaluate to blank ""
... and I'm seeing this kind of thing in all sorts of places during the build phase (sometimes even with unquoted variables in the docker-compose file). eg: -DAMDGPU_TARGETS=""gfx906""
Any time a quoted variable gets quoted itself, it could just be unsetting itself and leaving cruft in the command line.
I don't think half of this stuff is building right because of it, and I don't know how much I should be taking out of the Makefile because I can't see some of the build phase. I can see slivers of build output in docker compose, up until it stops showing the build output in the api-builder and api-stage6 phases.
NGL I'm not a docker expert, but it would be really helpful if I could see the entire output. @mudler sorry for the ping, but do you have any ideas? Thanks
edit: While I'm asking about Makefile stuff, I don't think a lot of the build phase honours BUILD_PARALLELISM
.
edit: I'm gonna start pulling all the images between 2.15.0 and 2.16.0 and see what broke it.
Okay, I think I narrowed it down... sha-e676809-hipblas-ffmpeg works, sha-cf513ef-hipblas-ffmpeg doesn't. So that give us:
cf513ef Update openai-functions.md
9e8b344 Update openai-functions.md
88d0aa1 docs: update function docs
9b09eb0 build: do not specify a BUILD_ID by default (#2284)
4db41b7 models(gallery): add aloe (#2283)
28a421c (origin/build_tag) feat: migrate python backends from conda to uv (#2215)
My bets are on the conda to uv change, but I'm not sure how to debug or fix it at the moment.
I've been trying to add verbose flags to various Makefiles, but it looks like docker compose
is ignoring everything that isn't in .env
, Dockerfile
or docker-compose.yaml
... I can't seem to get it to stop adding -s
to make
. :shrug:
@bunder2015 can you please give me some exact replication instructions? What exactly can I do to replicate your issue? If this happens when you run a docker-compose, can you please provide the contents of that docker-compose file?
Hi, thanks for the reply... I think this should be sufficient to replicate...
git clone https://github.com/mudler/localai
cd localai
git checkout -b v2.16.0 e0187c2a1a4cde837398ada217d0ad161b7976d6
version: '3.6'
services:
api:
# See https://localai.io/basics/getting_started/#container-images for
# a list of available container images (or build your own with the provided Dockerfile)
# Available images with CUDA, ROCm, SYCL
# Image list (quay.io): https://quay.io/repository/go-skynet/local-ai?tab=tags
# Image list (dockerhub): https://hub.docker.com/r/localai/localai
image: quay.io/go-skynet/local-ai:v2.16.0-hipblas-ffmpeg
build:
context: .
dockerfile: Dockerfile
args:
- IMAGE_TYPE=extras
- BASE_IMAGE=ubuntu:22.04
ports:
- 8080:8080
env_file:
- .env
environment:
- MODELS_PATH=/models
- DEBUG=true
- REBUILD=true
- BUILD_TYPE=hipblas
- GPU_TARGETS=gfx906
- GO_TAGS=stablediffusion tinydream tts
- BUILD_PARALLELISM=16
- LOCALAI_THREADS=16
- LOCALAI_UPLOAD_LIMIT=500
devices:
- /dev/dri
- /dev/kfd
volumes:
- ./models:/models:cached
- ./images/:/tmp/generated/images/
#command:
# Here we can specify a list of models to run (see quickstart https://localai.io/basics/getting_started/#running-models )
# or an URL pointing to a YAML configuration file, for example:
# - https://gist.githubusercontent.com/mudler/ad601a0488b497b69ec549150d9edd18/raw/a8a8869ef1bb7e3830bf5c0bae29a0cce991ff8d/phi-2.yaml
#- phi-2
docker compose up
Wait for the container to build, install dreamshaper, and try to use it to generate an image.
I'm using a Radeon VII but if I understand the commit diff correctly, any AMD GPU should suffice with the right GPU_TARGETS
value.
If you want me to do some further testing, please let me know. Cheers
@cryptk @mudler I gave sha-ba984c7-hipblas-ffmpeg a try but unfortunately I'm still getting the 'no nvidia driver' error still. Let me know when you want me to try again. :pray: Cheers
The official docker AIO hipblas image isnt working as well, seem to completely fail with the grpc server and loop through all backends...
Good morning @Hideman85, how did you set up localai? I would recommend using the docker compose
method... Without the devices block, it won't offload anything to the AMD GPU.
That said, I haven't had any issues with text models. Let us know how things turn out. Cheers
Same thing happening for me, try using gpt4 or llama3 7B none works with hipblas
Hi, I noticed your log says "product unknown", did you set GPU_TARGETS
? I would also try enabling DEBUG
and REBUILD
...
The only other thing that comes to mind is that you also have an Nvidia card in addition to the AMD card, I have a 980 kicking around, but I don't have the PCI-E bandwidth to install it into my threadripper.
I wish I could be of more help. Hopefully someone here might know what's up. Cheers
I just did a quick search for gfx1103, this appears to be a Radeon 780M IGPU, which might not be supported by ROCm... https://github.com/ROCm/ROCm/discussions/2631#discussioncomment-8929948
@bunder2015 I see in the logs you have cuda
set to true in the model - can you try by disabling it?
@bunder2015 When you suggested to try docker compose I used the exact same compose you shared above where it does rebuild everything. But even though it does not look to work.
What I do not understand is why the officially built All-In-One images are failing as well, I tried both cublas and hipblas none works, only cublas fallback to cpu that is working but I cannot make the use of any hardware acceleration.
Hi, @mudler, sha-ba984c7-hipblas-ffmpeg with cuda set to false loads the model into GPU memory, but all the work is being done on the CPU and is really slow... I've been waiting 30+ minutes and diffusers/backend.py is just spinning. I don't think it's going to finish, and its not throwing any errors anywhere. log
Hi, @mudler, sha-ba984c7-hipblas-ffmpeg with cuda set to false loads the model into GPU memory, but all the work is being done on the CPU and is really slow... I've been waiting 30+ minutes and diffusers/backend.py is just spinning. I don't think it's going to finish, and its not throwing any errors anywhere.
gotcha - the fact that you can manage to offload to the GPU's ram tells that it can correctly use it, but I think the CUDA flag explicitly forces to cuda.
However, I can't find any docs around diffusers and hipblas directly - What I can tell is that we used to pick up torch from a different pip index here when this feature was first introduced: https://github.com/mudler/LocalAI/commit/fb0a4c5d9a1fa425bb1c61e354faf26efa41154a#diff-01623ead8ec22d05e4d7a70d687c15ea27485959956bc6e864ffb1d8e374afb9R29 , while now we take it from a different url https://github.com/mudler/LocalAI/blob/master/backend/python/diffusers/requirements-hipblas.txt#L1
any chance you can try building a container image with this index https://github.com/mudler/LocalAI/commit/fb0a4c5d9a1fa425bb1c61e354faf26efa41154a#diff-01623ead8ec22d05e4d7a70d687c15ea27485959956bc6e864ffb1d8e374afb9R29 ?
I can't seem to find any images for fb0a4c5 on quay, but it looks like that commit belongs to v2.9.0... I can try that release if you would like, but v2.15.0 also works with cuda set to true... I think I originally started using localai around 2.12.x.
@bunder2015 what I mean is to build an image from current master branch manually, and swapping the index URL out. Sadly without an AMD card around there is going to be a little bit of back and forth:
I'd like to clear out if it's a problem of getting the dependencies from the correct repositories, and try to swap https://github.com/mudler/LocalAI/blob/e9c28a1ed7eef43ac5266029de5d9b3033c0103c/backend/python/diffusers/requirements-hipblas.txt#L1 with
--pre --extra-index-url https://download.pytorch.org/whl/nightly/
instead
Sadly without an AMD card around there is going to be a little bit of back and forth
That's okay, I don't mind...
what I mean is to build an image from current master branch manually, and swapping the index URL out.
Oh, I see what you mean now... I gave it 5 minutes, but it doesn't look like the model got loaded into memory. As a sanity check, I even pruned all my docker images and tried both urls again with cuda false on 100% fresh ba984c7 builds and got the same thing, I must have loaded another model by mistake prior to testing before. The old url with cuda true also gave me the "no nvidia driver" error.
Sorry for the confusion, and the delay (it takes a while to build from scratch).
Presently it is very hard to get a docker container to build with the rocm backend, some elements seem to fail independently during the build process. There are other related projects with functional docker implementations that do work with rocm out of the box (aka llama.cpp). I would like to work on this myself however between the speed at which things change in this project and the amount of time I have free to work on this, I am left only to ask for this.
If there are good 'stable' methods for building a docker implementation with rocm underneath already it would be very appreciated if this could be better documented. 'arch' helps nobody that wants to run on a more enterprisy os like rhel or sles.
Presently I have defaulted back to using textgen as it has a mostly functional api but its featureset is kinda woeful. (better than running llama.cpp directly imo)