Closed pengzhangzhi closed 10 months ago
My concerns are mainly around the efficiency in training.
In this case you should definitely be using the official implementation, as (1) it is heavily optimized, and (2) it has proper initialization.
On Tue, Jan 9, 2024 at 4:31 PM Zhangzhi Peng @.***> wrote:
My concerns are mainly around the efficiency in training.
— Reply to this email directly, view it on GitHub https://github.com/johnma2006/mamba-minimal/issues/13#issuecomment-1883829477, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACC5JWLXU27CRLJVYZXVQSLYNWZLZAVCNFSM6AAAAABBTYJ3WWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBTHAZDSNBXG4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>
thanks!!!
Hi, thanks for the great work! I would like to know does mamba-minimal has GPU op optimization? It seems to me that it doesn't have. I want to train a large-scale mamba and am currently considering the original mamba and mamba-minimal.