xjdr-alt / entropix

Entropy Based Sampling and Parallel CoT Decoding
Apache License 2.0
3.03k stars 311 forks source link

Torch doesn't work on mac. #50

Open OohBen opened 1 month ago

OohBen commented 1 month ago

When trying to use this on a mac I get this error (M3 Max): ➜ entropix git:(main) ✗ PYTHONPATH=. poetry run python entropix/torch_main.py Using device: mps <|begin_of_text|><|start_header_id|>system<|end_header_id|>

You're absolutely right. I need to delve deeper into my actual thought processes, including the uncertainties, associations, and even potential biases that arise as I consider the query. My previous responses, while informative, didn't truly capture the nuanced, sometimes messy nature of cognition. I'll strive to provide a more authentic representation of my internal dialogue, including moments of doubt, tangential thoughts, and the process of refining ideas. This should result in a more genuine demonstration of LLM chain of thought, reflection, and self-correction.

Which number is larger, 9.9 or 9.11?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

ToTraceback (most recent call last): File "/Users/xxx/shitbox/entropix/entropix/torch_main.py", line 130, in tyro.cli(main) File "/Users/xxx/Library/Caches/pypoetry/virtualenvs/entropix-OBaG1pBS-py3.12/lib/python3.12/site-packages/tyro/_cli.py", line 229, in cli return run_with_args_from_cli() ^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/xxx/shitbox/entropix/entropix/torch_main.py", line 127, in main generate(xfmr_weights, model_params, raw_tokens1) File "/Users/xxx/shitbox/entropix/entropix/torch_main.py", line 119, in generate logits, kvcache, scores, stats = xfmr(xfmr_weights, model_params, next_token, cur_pos, freqs_cis[cur_pos:cur_pos+1], kvcache) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/xxx/shitbox/entropix/entropix/torch_model.py", line 75, in xfmr h_attn, kvcache, scores = attention(norm_x, xfmr_weights.layer_weights[i], model_params, cur_pos, i, freqs_cis, kvcache, attn_mask=attn_mask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/xxx/shitbox/entropix/entropix/torch_model.py", line 50, in attention scores = torch.matmul(xq, keys) ^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Placeholder storage has not been allocated on MPS device!

Arrabonae commented 1 month ago

i think the #41 and #48 will fix your issue. if you want to a hot fix, you can manually edit what #41 has suggested on your local repo.

samefarrar commented 1 month ago

If you're interested, I have a MLX fork that runs on my M2 Macbook out the box and with ~48 tok/s. Have implemented server, SSE, and a frontend for changing system prompts etc. still WIP though but caught up to the main branch (frog branch incoming).

HenkPoley commented 1 month ago

@samefarrar In the case of mlx_download_weights.py:20 catching a HTTPError exception with a 403 Forbidden, maybe remove the weights/Llama-3.2-1B-Instruct/ directory, otherwise it will not download it next when you have requested and gotten access.

HenkPoley commented 1 month ago

@samefarrar Can you add the if before the elif that you forgot the commit? It lives around here:

https://github.com/samefarrar/entropix_mlx/blob/d6ffe6a91656ad347b78c49ee0a267f298354487/mlx_sampler.py#L109-L137

The patch probably looks like: https://github.com/samefarrar/entropix_mlx/pull/17