Hi MLX team,
I want to request a feature/example implementation of Beam Search Decoder for one of the text generation examples. The current implementations only cover Greedy and Top-P sampling. I currently implemented a naive beam search implementation, which runs on CPU and is slow with many for loops. It would be helpful if someone from your team could provide a reference implementation using MLX kernels and efficiently utilize GPU or vectorized CPU kernels.
I am happy to collaborate on this if I can get some guidance from your team.
Hi MLX team, I want to request a feature/example implementation of Beam Search Decoder for one of the text generation examples. The current implementations only cover Greedy and Top-P sampling. I currently implemented a naive beam search implementation, which runs on CPU and is slow with many for loops. It would be helpful if someone from your team could provide a reference implementation using MLX kernels and efficiently utilize GPU or vectorized CPU kernels.
I am happy to collaborate on this if I can get some guidance from your team.