xyzhang626 / embeddings.cpp

ggml implementation of embedding models including SentenceTransformer and BGE
MIT License
52 stars 3 forks source link

Cuda? #2

Open SpaceCowboy850 opened 9 months ago

SpaceCowboy850 commented 9 months ago

Do you have any plans to support the other backends that LlamaCPP supports so that this can be accelerated?

xyzhang626 commented 9 months ago

sorry for the late reply.

Yes I do have the plan to support CUDA actually. But because of my personal issue it might be implemented months later. I would suggest you to use more serious repo if you get a GPU.

grantbey commented 9 months ago

Hey @xyzhang626 do you have any resources/pointers/tips on how CUDA is implemented in ggml? Unless I'm missing something there's basically zero documentation.

I've adapted this code to support a slightly different architecture for my needs, but I can't quite figure out how to begin with CUDA.

Any help would be appreciated. If I succeed I could also do a PR into this repo.

xyzhang626 commented 8 months ago

Sorry for the late reply @grantbey

Yes the lack of document is one of the biggest challenges for people who want to build something based the ggml. It's really annoying. I think the best way (or the only way) to do that is referring to more mature repo built with ggml, e,g, chatglm.cpp

grantnebula commented 8 months ago

Thanks @xyzhang626! That's sort of what I've been doing. I'll take a look at the example you gave, hopefully it's easier to follow than the ones I've seen elsewhere.

(edit: realised I replied from a different account oops)

xyzhang626 commented 8 months ago

hey @grantbey @grantnebula maybe you should look at this, which forks this repo, optimize code a lot and support CUDA!