There are two new models that sparsify either the whole encoder or just the non-attention layers.
There's a class wrapper register_bert_model for Bert models that automatically creates a related BertConfig, BertForMaskedLM , and BertForSequenceClassification and then registers these classes with the Transformers library so they may be loaded by AutoConfig, AutoModelForMaskedLM, and AutoModelForSequenceClassification. More on this below.
Per the checkpoints:
I modified our previously made checkpoint in bert-steps_100k-sparsity_0.8 to use one of those new models so we can load it more easily. This model is in bert-steps_100k-sparsity_0.8_updated
I exported this checkpoint to /mnt/efs/results/pretrained-models/transformers-local/static_sparse_non_attention_bert_100k
Per the configs:
The config finetuning_sparse_bert_100k_glue finetunes that model and is in the updated README under sparse_v1_100k
Currently, I'm running static_sparse_encoder_bert_100k which sparsifies all layers, including attention. I'll update those results soon.
Example with register_bert_model:
@register_bert_model
class SparseBertModel(BertModel):
@dataclass
class ConfigKWargs:
# Keyword arguments to configure sparsity.
sparsity: float = 0.9
# Define __init__, ect.
...
This will automatically create new classes called SparseBertConfig, SparseBertForMaskedLM, and SparseBertForSequenceClassification. Notice that the naming is automatic and is a function of the name of your original class. For instance, if you define DynamicSparseBertModel, you'd get a class named DynamicSparseBertConfig and so on.
As soon as you define the class, it's ready to autoload. For instance, you could do
config = AutoConfig.for_model(model_type="sparse_bert", sparsity=0.5)
model = AutoModelForMaskedLM.from_config(model)
type(model)
>>> SparseBertModelForMaskedLM
Notice how the model_type of "sparse_bert" has also been automatically formatted. In the other example, you'd use model_type="dynamic_sparse_bert". As well, the config is already equipped to accept the argument sparsity which can be accessed by your model. Thus, you can run
config.sparsity
>>> 0.5
This comes from the ConfigKWargs defined above. You can add whatever arguments you want to that dataclass. With this, we can modify our experiment configs and configure our models as desired.
Per the new models
register_bert_model
for Bert models that automatically creates a relatedBertConfig
,BertForMaskedLM
, andBertForSequenceClassification
and then registers these classes with the Transformers library so they may be loaded byAutoConfig
,AutoModelForMaskedLM
, andAutoModelForSequenceClassification
. More on this below.Per the checkpoints:
bert-steps_100k-sparsity_0.8
to use one of those new models so we can load it more easily. This model is inbert-steps_100k-sparsity_0.8_updated
/mnt/efs/results/pretrained-models/transformers-local/static_sparse_non_attention_bert_100k
Per the configs:
finetuning_sparse_bert_100k_glue
finetunes that model and is in the updated README undersparse_v1_100k
static_sparse_encoder_bert_100k
which sparsifies all layers, including attention. I'll update those results soon.Example with
register_bert_model
:This will automatically create new classes called
SparseBertConfig
,SparseBertForMaskedLM
, andSparseBertForSequenceClassification
. Notice that the naming is automatic and is a function of the name of your original class. For instance, if you defineDynamicSparseBertModel
, you'd get a class namedDynamicSparseBertConfig
and so on.As soon as you define the class, it's ready to autoload. For instance, you could do
Notice how the
model_type
of"sparse_bert"
has also been automatically formatted. In the other example, you'd usemodel_type="dynamic_sparse_bert"
. As well, the config is already equipped to accept the argumentsparsity
which can be accessed by your model. Thus, you can runThis comes from the
ConfigKWargs
defined above. You can add whatever arguments you want to that dataclass. With this, we can modify our experiment configs and configure our models as desired.