Closed yaozengwei closed 2 years ago
Nice! BTW, How many ConformerEncoderLayer are used?
Nice! BTW, How many ConformerEncoderLayer are used? Is that similar to 218.452 divided by sum(8.39 + 17.49 + 4.4 + 5.4 + 2.39 + 4.31)?
12 ConformerEncoderLayers.
The model parameters I use are
params = AttributeDict(
{
# parameters for conformer
"feature_dim": 80,
"subsampling_factor": 4,
"encoder_dim": 512,
"nhead": 8,
"dim_feedforward": 2048,
"num_encoder_layers": 12,
# parameters for decoder
"decoder_dim": 512,
"context_size": 2,
"vocab_size": 500,
"blank_id": 0,
# parameters for joiner
"joiner_dim": 512,
}
)
You can check the record details of each module in https://github.com/yaozengwei/model_profiling/blob/master/conformer.py.
The last column shows the number of calls. The record is with 20 iterations.
I use pytorch profiler (https://pytorch.org/docs/stable/profiler.html) to compare relative contribution of different modules of the Conformer model to the taining time.
Code and more details can be found on https://github.com/yaozengwei/model_profiling.
Experiments are based on the recipe
egs/librispeech/ASR/pruned_transducer_stateless2
.Specifically, I wrap the code for each module I want to record in separate labelled context managers using
profiler.record_function("label")
.The labels I use include:
The following picture shows the profiling result.