nlpodyssey / verbaflow

Neural Language Model for Go
BSD 2-Clause "Simplified" License
58 stars 5 forks source link

Slow prompt response time #1

Closed anthonycorletti closed 1 year ago

anthonycorletti commented 1 year ago

After following the instructions in the README to download, convert, and format the model, running a prompt would take an incredibly long time, often waiting a few minutes at a time per word. I'm using a 2021 M1 Pro for context. Should it be this slow?

matteo-grella commented 1 year ago

The library is optimized to run in x86-64 CPUs. If you want to run it on the Apple Silicon, you can use the GOARCH=amd64 environment variable.

Iā€™m using a 2021 M1 Pro as well and should definitely run much faster than that. Use GOARCH=amd64 both for building and running.

You need to convert the model again; there is a flag you can manually enable in the cmd/main.go to force converting an already converted model:

func convert(modelDir string) error {
    return rwkvlm.ConvertPickledModelToRWKVLM[float32](&rwkvlm.ConverterConfig{
        ModelDir:         modelDir,
        OverwriteIfExist: false, // <ā€” set this to true
    })
}

Let me know if that works!

anthonycorletti commented 1 year ago

Hey thanks @matteo-grella. So I did set GOARCH to amd64 like you noted in the README, but it's still very slow. I've also tried GOARCH=arm64 but that was also just as slow. I'm currently using go version go1.19.5 darwin/arm64.

matteo-grella commented 1 year ago

We use the same Go version and the same architecture so something must have gone wrong with the environment variable, in my opinion.

Try this if you haven't already:

GOARCH=amd64 go build -o verbaflow cmd/main.go
GOARCH=amd64 ./verbaflow convert models/nlpodyssey/RWKV-4-Pile-3B-Instruct
GOARCH=amd64 ./verbaflow inference models/nlpodyssey/RWKV-4-Pile-3B-Instruct

Also remember my comment above about forcing overwriting the converted model.

anthonycorletti commented 1 year ago

Hey Matteo, confirming that I have tried this and it's still very slow to generate text during inference.

matteo-grella commented 1 year ago

RAM and num of CPUs?

Does it use the swap during inference?

matteo-grella commented 1 year ago

@anthonycorletti Do you have time to give it a try again? Now the default model is smaller and we've optimized a few things here and there.

anthonycorletti commented 1 year ago

hey giving it a go now

first bump in the road

$ GOARCH=amd64 go build -o verbaflow ./cmd/verbaflow
go: downloading github.com/nlpodyssey/spago v1.0.2-0.20230202124145-3cffe41f485c
go: downloading github.com/urfave/cli/v2 v2.24.3
go: downloading google.golang.org/grpc v1.33.2
go: downloading github.com/nlpodyssey/spago/embeddings/store/diskstore v0.0.0-20230202124145-3cffe41f485c
go: downloading github.com/nlpodyssey/gopickle v0.2.0
go: downloading github.com/nlpodyssey/rwkv v0.0.0-20230212203924-6a6eeeabd546
go: downloading github.com/xrash/smetrics v0.0.0-20201216005158-039620a65673
go: downloading google.golang.org/genproto v0.0.0-20200526211855-cb27e3aa2013
go: downloading github.com/cpuguy83/go-md2man v1.0.10
go: downloading golang.org/x/text v0.6.0
go: downloading github.com/russross/blackfriday v1.5.2
# github.com/nlpodyssey/verbaflow/rwkvlm
rwkvlm/converter.go:615:27: undefined: strings.CutPrefix
note: module requires Go 1.20

using go version go1.19.5 darwin/arm64

anthonycorletti commented 1 year ago

nice! after using go 1.20 things are looking faster!

server

$ ./verbaflow -log-level trace -model-dir models/nlpodyssey/RWKV-4-Pile-1B5-Instruct inference --address :50051
4:40PM DBG Starting inference server for model in dir: models/nlpodyssey/RWKV-4-Pile-1B5-Instruct
4:40PM DBG Loading model...
4:41PM DBG Server listening on :50051
echo '\nQ: Briefly: The Universe is expanding, its constituent galaxies flying apart like pieces of cosmic shrapnel in the aftermath of the Big Bang. Which section of a newspaper would this article likely appear in?\n\nA:' | go run ./examples/prompttester --dconfig ./examples/prompttester/config.yaml

4:42PM DBG Received request from%!(EXTRA <nil>)
4:42PM TRC Decoding...
4:42PM TRC Tokenizing prompt: "\nQ: Briefly: The Universe is expanding, its constituent galaxies flying apart like pieces of cosmic shrapnel in the aftermath of the Big Bang. Which section of a newspaper would this article likely appear in?\n\nA:"
4:42PM TRC Preprocessing 46 token IDs: [187 50 27 25487 27 380 21368 310 16122 13 697 31239 11123 12060 7419 751 7437 273 23147 439 1761 8101 275 253 31433 273 253 7967 14332 15 6758 2593 273 247 11547 651 436 3929 2779 3176 275 32 187 187 34 27]
4:42PM TRC Encoding sequence of 46 tokens...
4:42PM TRC Preprocessing took 7.53509s
4:42PM TRC Generating...
4:42PM TRC Applying topP control topP=0.800000011920929
4:42PM TRC using multinomial sampling
4:42PM TRC Reached end token (0)
4:42PM TRC [1.46] Generated token IDs: [6875 285 10784 0]
4:42PM DBG Done.
4:42PM TRC Inference time: 10.41 seconds
4:43PM DBG Received request from%!(EXTRA <nil>)
4:43PM TRC Decoding...
4:43PM TRC Tokenizing prompt: "\nQ: Briefly: The Universe is expanding, its constituent galaxies flying apart like pieces of cosmic shrapnel in the aftermath of the Big Bang. Which section of a newspaper would this article likely appear in?\n\nA:"
4:43PM TRC Preprocessing 46 token IDs: [187 50 27 25487 27 380 21368 310 16122 13 697 31239 11123 12060 7419 751 7437 273 23147 439 1761 8101 275 253 31433 273 253 7967 14332 15 6758 2593 273 247 11547 651 436 3929 2779 3176 275 32 187 187 34 27]
4:43PM TRC Encoding sequence of 46 tokens...
4:43PM TRC Preprocessing took 3.656375125s
4:43PM TRC Generating...
4:43PM TRC Applying topP control topP=0.800000011920929
4:43PM TRC using multinomial sampling
4:43PM TRC Reached end token (0)
4:43PM TRC [1.98] Generated token IDs: [5859 0]
4:43PM DBG Done.
4:43PM TRC Inference time: 4.04 seconds

client

$ echo '\nQ: Briefly: The Universe is expanding, its constituent galaxies flying apart like pieces of cosmic shrapnel in the aftermath of the Big Bang. Which section of a newspaper would this article likely appear in?\n\nA:' | go run ./examples/prompttester --dconfig ./examples/prompttester/config.yaml
4:42PM INF Decoding options:
 {MaxLen:200 MinLen:0 StopSequencesIDs:[[187 23433 27] [187 50 708 329] [187 50 27] [187 34 27]] EndTokenID:0 SkipEndTokenID:true Temp:1 TopK:0 TopP:0.8 UseSampling:true}

4:42PM TRC Input: "\nQ: Briefly: The Universe is expanding, its constituent galaxies flying apart like pieces of cosmic shrapnel in the aftermath of the Big Bang. Which section of a newspaper would this article likely appear in?\n\nA:"
4:42PM TRC Building prompt from template: "{{.Text}}"
4:42PM TRC Input fields: {Text:
Q: Briefly: The Universe is expanding, its constituent galaxies flying apart like pieces of cosmic shrapnel in the aftermath of the Big Bang. Which section of a newspaper would this article likely appear in?

A: Question: TargetLanguage:}
4:42PM TRC Final prompt: "\nQ: Briefly: The Universe is expanding, its constituent galaxies flying apart like pieces of cosmic shrapnel in the aftermath of the Big Bang. Which section of a newspaper would this article likely appear in?\n\nA:"
 Science and Technology4:42PM DBG Done.
šŸ”Œ āŒ›ļø Anthonys-MBP in ~/verbaflow on main 
$ echo '\nQ: Briefly: The Universe is expanding, its constituent galaxies flying apart like pieces of cosmic shrapnel in the aftermath of the Big Bang. Which section of a newspaper would this article likely appear in?\n\nA:' | go run ./examples/prompttester --dconfig ./examples/prompttester/config.yaml
4:43PM INF Decoding options:
 {MaxLen:200 MinLen:0 StopSequencesIDs:[[187 23433 27] [187 50 708 329] [187 50 27] [187 34 27]] EndTokenID:0 SkipEndTokenID:true Temp:1 TopK:0 TopP:0.8 UseSampling:true}

4:43PM TRC Input: "\nQ: Briefly: The Universe is expanding, its constituent galaxies flying apart like pieces of cosmic shrapnel in the aftermath of the Big Bang. Which section of a newspaper would this article likely appear in?\n\nA:"
4:43PM TRC Building prompt from template: "{{.Text}}"
4:43PM TRC Input fields: {Text:
Q: Briefly: The Universe is expanding, its constituent galaxies flying apart like pieces of cosmic shrapnel in the aftermath of the Big Bang. Which section of a newspaper would this article likely appear in?

A: Question: TargetLanguage:}
4:43PM TRC Final prompt: "\nQ: Briefly: The Universe is expanding, its constituent galaxies flying apart like pieces of cosmic shrapnel in the aftermath of the Big Bang. Which section of a newspaper would this article likely appear in?\n\nA:"
 science4:43PM DBG Done.

i think this issue can be closed out with a celebratory šŸŽ‰ thanks @matteo-grella !

matteo-grella commented 1 year ago

Your feedback is greatly appreciated! Now that it works for you, there's nothing stopping you from sharing it again! ;)