huggingface / candle

Minimalist ML framework for Rust
Apache License 2.0
15.89k stars 963 forks source link

RWKV runs without stopping #1755

Closed danielclough closed 8 months ago

danielclough commented 9 months ago

On Ubuntu 22.04 using CUDA. Repo is in sync with main.

cargo run --features cuda --example rwkv --release -- --prompt "The smallest prime is "
    Finished release [optimized] target(s) in 0.48s
     Running `target/release/examples/rwkv --prompt 'The smallest prime is '`
avx: true, neon: false, simd128: false, f16c: true
temp: 0.00 repeat-penalty: 1.10 repeat-last-n: 64
retrieved the files in 523.207µs
loaded the model in 5.201251761s
The smallest prime is ϕ(2) = 2.
The smallest composite is ϕ(3) = 3.
The smallest perfect number is ϕ(5) = 5.
The smallest perfect square is ϕ(4) = 4.
The smallest perfect cube is ϕ(6) = 6.
The smallest perfect hexagon is ϕ(7) = 7.
The smallest perfect octagon is ϕ(8) = 8.
The smallest perfect decagon is ϕ(9) = 9.
The smallest perfect dodecagon is ϕ(10) = 10.
The smallest perfect dodecahedron is ϕ(11) = 11.
The smallest perfect icosahedron is ϕ(12) = 12.
The smallest perfect icosidodecahedron is ϕ(13) = 13.
The smallest perfect icosidodecahedron is ϕ(14) = 14.
The smallest perfect icosidodecahedron is ϕ(15) = 15.
The smallest perfect icosidodecahedron is ϕ(16) = 16.
The smallest perfect icosidodecahedron is ϕ(17) = 17.
The smallest perfect icosidodecahedron is ϕ(18) = 18.
The smallest perfect icosidodecahedron is ϕ(19) = 19.
...

I spent a few seconds to see if I could fix it and found these docs that mention stopping criteria: https://huggingface.co/docs/transformers/model_doc/rwkv

I can probably spend more time on this later, but @LaurentMazare can probably fix it real quick. :superhero:

I'm excited about RWKV! Thanks again!

LaurentMazare commented 9 months ago

On this example, I think running for that long (and continuing) is actually expected as we don't reach an end of stream token nor the twice 187 tokens mentioned in the document you're referring too. So far I haven't found a prompt that would trigger such end of streams so it's a bit unclear to me if this should be implemented or not.

danielclough commented 8 months ago

Perhaps the example README should indicate that it will run for many more lines than what is currently showed as a response so people don't think that their response is buggy?

LaurentMazare commented 8 months ago

Closing as the semantics should now be similar to the python ones.