This allows for overriding default block configs for certain layers. This should contain two sub configs: order and overrides. order specifies the order of different kinds of layers (default refers to a layer that does not apply any overrides). For each kind of layer, specify the overrides in the overrides config. For example, to specify this model (https://research.character.ai/optimizing-inference/) , the following config will be needed:
model:
...
(usual model configs)
...
block_overrides:
order:
- name: default
- order:
- name: sliding_window_layer
- name: sliding_window_layer_reuse
- name: sliding_window_layer
- repeat: 2
name: sliding_window_layer_reuse
- name: reuse_kv_layer
repeat: 2
overrides:
sliding_window_layer:
attn_config:
sliding_window_size: 1024
sliding_window_layer_reuse:
attn_config:
sliding_window_size: 1024
reuse_kv_layer_idx: -1 # Relative index of the layer whose kv cache to reuse
reuse_kv_layer:
attn_config:
reuse_kv_layer_idx: -6 # Relative index of the layer whose kv cache to reuse
Also prints the following log summarizing the network:
This allows for overriding default block configs for certain layers. This should contain two sub configs: order and overrides. order specifies the order of different kinds of layers (default refers to a layer that does not apply any overrides). For each kind of layer, specify the overrides in the overrides config. For example, to specify this model (https://research.character.ai/optimizing-inference/) , the following config will be needed:
Also prints the following log summarizing the network:
Note that the table above prints the absolute layer index for
reuse_kv_layer_idx