Open William-HYWu opened 1 month ago
The experimental results can shift by about 2% in terms of accuracy
Hi, are you using our BERT training codebase?
Hi, are you using our BERT training codebase?
Hi, no, I'm just using the hydra block.
import torch
from .hydra import Hydra
batch, length, dim = 2, 64, 16
x = torch.randn(batch, length, dim).to("cuda")
model = Hydra(
d_model=dim, # Model dimension d_model
d_state=64, # SSM state expansion factor
d_conv=7, # Local non-causal convolution width
expand=2, # Block expansion factor
use_mem_eff_path=False, # Nightly release. Thanks to Alston Lo
).to("cuda")
y = model(x)
assert y.shape == x.shape
Then, the codebase probably needs extra settings for fixing seeds, such as configuring torch.backends.cudnn.deterministic
Then, the codebase probably needs extra settings for fixing seeds, such as configuring
torch.backends.cudnn.deterministic
Thanks for your reply. I think the problem lies in the atomic adds of mamba as mentioned in https://github.com/state-spaces/mamba/issues/137
Dear Authors, Thanks for your wonderful work! I'm trying to implement hydra on some datasets, but the results vary greatly across identically configured experiments. I've already fixed the random seeds with the following code. Did I miss anything that will lead to different performance with the same configuration?