Open SpaceCowboy850 opened 9 months ago
sorry for the late reply.
Yes I do have the plan to support CUDA actually. But because of my personal issue it might be implemented months later. I would suggest you to use more serious repo if you get a GPU.
Hey @xyzhang626 do you have any resources/pointers/tips on how CUDA is implemented in ggml? Unless I'm missing something there's basically zero documentation.
I've adapted this code to support a slightly different architecture for my needs, but I can't quite figure out how to begin with CUDA.
Any help would be appreciated. If I succeed I could also do a PR into this repo.
Sorry for the late reply @grantbey
Yes the lack of document is one of the biggest challenges for people who want to build something based the ggml. It's really annoying. I think the best way (or the only way) to do that is referring to more mature repo built with ggml, e,g, chatglm.cpp
Thanks @xyzhang626! That's sort of what I've been doing. I'll take a look at the example you gave, hopefully it's easier to follow than the ones I've seen elsewhere.
(edit: realised I replied from a different account oops)
hey @grantbey @grantnebula maybe you should look at this, which forks this repo, optimize code a lot and support CUDA!
Do you have any plans to support the other backends that LlamaCPP supports so that this can be accelerated?