huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

Mamba dependecies #112

Closed staghado closed 4 months ago

staghado commented 6 months ago

Why is flash-attn a dependency in Mamba?

staghado commented 5 months ago

Obviously this is not needed.

xrsrke commented 5 months ago

hello hello. you're right. we'll remove it. thanks