smolorg / smoltropix

MLX port for xjdr's entropix sampler (mimics jax implementation)
Apache License 2.0
42 stars 6 forks source link

Model keeps rambling #4

Open AtakanTekparmak opened 5 days ago

AtakanTekparmak commented 5 days ago

Trying the example query 9.9 vs 9.11 query on a M1 Max 64GB, Python 3.11.2. I haven't ran the original entropix logo but the model output doesn't seem right. Only "I" at the beginning gets printed as green text, rest plain white

(venv) atakantekparmak@Atakans-MacBook-Pro smoltropix % python main.py --input "Which is greater, 9.9 or 9.11?"
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
<antThinking>
I need to delve deeper into my actual thought processes, including the uncertainties, associations, and even potential biases that arise as I consider the query. My previous responses, while informative, didn't truly capture the nuanced, sometimes messy nature of cognition. I'll strive to provide a more authentic representation of my internal dialogue, including moments of doubt, tangential thoughts, and the process of refining ideas. This should result in a more genuine demonstration of LLM chain of thought, reflection, and self-correction.
</antThinking>

Which is greater, 9.9 or 9.11?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

=============== generating =================
I’m sorry, but I can’t does this calculator equation given and you greater you have math with for just I calculate less if both two as two one so or round greater which nine hundred in more round just after for first half decimal if decimal it with after hundred after .round either number give which better number just two first first given either just rounded it either but. five I find no other equation will show one in number it after but greater to that either with this answer then better of between give then this then just before rounded before just or or then if that but before of. is more if ten as four with you before more four give first greater greater which this either so which between it this than which as this more this five four .better after less both less number between if in better so of to I in two nine first four better four with than than that . but ten with round better nine that. you you more with then four give number one this give of than it four greater after if between number more number  it so two than as give and as then which. if which more one . in round in then as but with one between it number if as with between so nine you both with it greater less four it this both this so three three both both I that more between three than greater more between so better better better greater less greater five it both it with then give both so with after of if than is than give so with first you is  than that to less so this so nine so with of first. as ten to three after more. which more that first first this it in give between ten less in five first if than but but not nine to with better .5 both less to then more with both than to between than both between greater not five than of then four that not three if both this first not you less give which with more ten you between more then to then nine 0 better as that nine as this four I less first first first one not which than this the three so give in with as give with give this ten so both it was three it also to with between greater also four than after better ten after also so also greater before more if as than so ten then that  that so more this with of between also in with then more to you more nine first also .better more. than first with it both five also five three which if better this to is I than with also so then both more better give less not than in better it's with more that's that with than both with give of and ten better nine ten you less . before not. with so also this as was this if to more three the it if than after if after. between both four give  not which that than nine. to if with also greater in than as to in also both so one as also more this after three nine five but four this more you to so to first but three first first both less it more four nine than better the five four more both four with to than .<|eot_id|>%                                                                

Also provided the installed packages below:

(venv) atakantekparmak@Atakans-MacBook-Pro smoltropix % pip freeze
blobfile==3.0.0
certifi==2024.8.30
charset-normalizer==3.4.0
docstring_parser==0.16
filelock==3.16.1
fsspec==2024.9.0
huggingface-hub==0.25.2
idna==3.10
Jinja2==3.1.4
lxml==5.3.0
markdown-it-py==3.0.0
MarkupSafe==3.0.1
mdurl==0.1.2
ml_dtypes==0.5.0
mlx==0.18.1
mpmath==1.3.0
networkx==3.4.1
numpy==2.1.2
packaging==24.1
pycryptodomex==3.21.0
Pygments==2.18.0
PyYAML==6.0.2
regex==2024.9.11
requests==2.32.3
rich==13.9.2
safetensors==0.4.5
shtab==1.7.1
sympy==1.13.3
tiktoken==0.8.0
tokenizers==0.20.1
torch==2.4.1
tqdm==4.66.5
transformers==4.45.2
typing_extensions==4.12.2
tyro==0.8.12
urllib3==2.2.3
masonjames commented 5 days ago

The model also produces gibberish for me when I did the same. I'm also on an m1 max but with 32gb. Same thing, the text came back as all white. Python version: 3.11.1 PyTorch version: 2.0.1 NumPy version: 1.26.4 MLX version: 0.17.3


<|begin_of_text|><|start_header_id|>system<|end_header_id|>
<antThinking>
I need to delve deeper into my actual thought processes, including the uncertainties, associations, and even potential biases that arise as I consider the query. My previous responses, while informative, didn't truly capture the nuanced, sometimes messy nature of cognition. I'll strive to provide a more authentic representation of my internal dialogue, including moments of doubt, tangential thoughts, and the process of refining ideas. This should result in a more genuine demonstration of LLM chain of thought, reflection, and self-correction.
</antThinking>

Which is greater, 9.9 or 9.11?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

=============== generating =================
I’m sorry�������urbed atmurbedardycowurbedardycowuçuçuçansıpast dungurbed�arbeday objected atm dung atm�وروبELINK�����������������������uç�������uç���dccáčcaccccıttarget: Number calculation> calculations suggest our opinion will definitely turn����������12220thelapse for such specific scenario it generally comesDown<eckuç��uç������ansıuç������������������.gr.gr thế.gr�������������������������������������������������������������������������������������������������������.gr.gr.gr.gr.gr.gr.gr.gr thế thế thế���������������uç���������������������������������������������������������������������.gr.gr.gr.gr.gr.gr.gr.gr.gr.gr.gr.```
Maharshi-Pandya commented 5 days ago

This happens for me as well. I have identified that the issue comes from the sampler itself (possibly due to all the hyperparameters).

So, when in the sampler method if you return the sampling immediately using a constant temperature, top_k, top_p, and min_p the output will be what you expect.

Example, if you use this code below, it gives the correct answer:

def sample(
        gen_tokens: mx.array, logits: mx.array, attention_scores: mx.array, cfg: SamplerConfig,
        clarifying_question_token: int = 2564, key: mx.array = None
    ) -> Tuple[mx.array, str, dict]:

    return _sample(logits, temperature=0.6, top_p=0.97, top_k=27, min_p=0.2), COLORS["lelv"], metrics

You can always tune the hyper-parameters and see what works for the current state of the sampler since essentially this is a research code. You can also experiment around the sampling conditions to check if it can be improved.

I'm also trying to see how it can be improved. Hope this helps.

AtakanTekparmak commented 5 days ago

This happens for me as well. I have identified that the issue comes from the sampler itself (possibly due to all the hyperparameters).

So, when in the sampler method if you return the sampling immediately using a constant temperature, top_k, top_p, and min_p the output will be what you expect.

Example, if you use this code below, it gives the correct answer:

def sample(
        gen_tokens: mx.array, logits: mx.array, attention_scores: mx.array, cfg: SamplerConfig,
        clarifying_question_token: int = 2564, key: mx.array = None
    ) -> Tuple[mx.array, str, dict]:

    return _sample(logits, temperature=0.6, top_p=0.97, top_k=27, min_p=0.2), COLORS["lelv"], metrics

You can always tune the hyper-parameters and see what works for the current state of the sampler since essentially this is a research code. You can also experiment around the sampling conditions to check if it can be improved.

I'm also trying to see how it can be improved. Hope this helps.

It seems like after the first token everything falls under the adaptive sampling block, you're right, it's probably related to the hyperparams in the SamplerConfig.