Put prompt processing in same stream

This is really an issue in MLX core with setting streams with compiled functions. But it is not trivial to fix and will require a new release, so patching it here as well:

mlx_lm.generate --model mlx-community/Meta-Llama-3.1-8B-Instruct-4bit --prompt - -m 256 < prompt20.txt

Pre:

Prompt: 1929 tokens, 767.674 tokens-per-sec
Generation: 256 tokens, 41.312 tokens-per-sec
Peak memory: 5.082 GB

Post:

Prompt: 1929 tokens, 775.695 tokens-per-sec
Generation: 256 tokens, 72.733 tokens-per-sec
Peak memory: 5.197 GB

ml-explore / mlx-examples

Put prompt processing in same stream #1122