Attempt to fix rambling

smolorg / smoltropix

MLX port for xjdr's entropix sampler (mimics jax implementation)

Apache License 2.0

57 stars 8 forks source link

Attempt to fix rambling #5

Open crabstickxbt opened 1 month ago

crabstickxbt commented 1 month ago

Just found that this mask replaces all 0's in attention scores cache (past current token) to big negative value (DEFAULT_MAX_VALUE), this blows interaction_strength = mx.mean(mx.abs(attention_scores), axis=(1, 2, 3)) to inf

Another note: this mask is not present in main entropix repo, i think its safe to remove it