Pre-layernorm - Githubissues

jpata / particleflow

Machine-learned, GPU-accelerated particle flow reconstruction

Apache License 2.0

24 stars 30 forks source link

Pre-layernorm #339

Closed erwulff closed 3 months ago

erwulff commented 3 months ago

Implements a pre-layernorm self-attention layer that can be enabled for the attention-based model by setting use_pre_layernorm=True in the config file.

According to this paper, pre-layernorm transformers are less sensitive to HP thus requiring less HPO.

erwulff commented 3 months ago

Test breaks because dataset is already generated at v2.1.0 but training config is still at 2.0.0 until the updated datasets with additional stats are copied.