goombalab / hydra

Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"
93 stars 6 forks source link

Results vary greatly across experiments #12

Open William-HYWu opened 1 month ago

William-HYWu commented 1 month ago

Dear Authors, Thanks for your wonderful work! I'm trying to implement hydra on some datasets, but the results vary greatly across identically configured experiments. I've already fixed the random seeds with the following code. Did I miss anything that will lead to different performance with the same configuration?


random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
William-HYWu commented 1 month ago

The experimental results can shift by about 2% in terms of accuracy

sukjunhwang commented 1 month ago

Hi, are you using our BERT training codebase?

William-HYWu commented 1 month ago

Hi, are you using our BERT training codebase?

Hi, no, I'm just using the hydra block.


import torch
from .hydra import Hydra

batch, length, dim = 2, 64, 16
x = torch.randn(batch, length, dim).to("cuda")
model = Hydra(
    d_model=dim, # Model dimension d_model
    d_state=64,  # SSM state expansion factor
    d_conv=7,    # Local non-causal convolution width
    expand=2,    # Block expansion factor
    use_mem_eff_path=False,    # Nightly release. Thanks to Alston Lo
).to("cuda")
y = model(x)
assert y.shape == x.shape
sukjunhwang commented 1 month ago

Then, the codebase probably needs extra settings for fixing seeds, such as configuring torch.backends.cudnn.deterministic

William-HYWu commented 1 month ago

Then, the codebase probably needs extra settings for fixing seeds, such as configuring torch.backends.cudnn.deterministic

Thanks for your reply. I think the problem lies in the atomic adds of mamba as mentioned in https://github.com/state-spaces/mamba/issues/137