Closed BurkeHulk closed 3 months ago
Tested the attention forward function with dummy inputs. Make sure the outputs are same among flash_attn_func and torch attention computation.
Tested the attention forward function with dummy inputs. Make sure the outputs are same among flash_attn_func and torch attention computation.