Adopting upstream sampling API refactors

Llama.cpp implemented breaking sampling API refactors which changed from exposing stateless sampling functions for use in the generation loop to a stateful sampler chain which is setup up front and can be manipulated from there.

I took a stab at spiking out how integrating the new API into llama-cpp-rs could look, but it's far from releasable. I stopped once I got simple running successfully.

Would love your perspective on approaches to unifying the existing sampling interfaces exposed by the crate, what to do about llama token data array sampling functions, handling grammar sampling, as well as feedback on the rust seams over llama.cpp's latest sampling APIs.

Once we're on the same page, I can flesh out last missing pieces like sampler timings & any higher level abstractions over sampling parameters — it seems the library doesn't expose any sampling parameter structs directly any more :/

I have these changes up as a branch on my fork, but if you'd prefer I'm more than happy to collaborate on a proper draft pull request.

utilityai / llama-cpp-rs

Adopting upstream sampling API refactors #548