Closed wassname closed 5 years ago
Result looks like this:
===========================================================================================================
Kernel Shape \
Layer
0_bert.embeddings.Embedding_word_embeddings [1024, 28996]
1_bert.embeddings.Embedding_position_embeddings [1024, 512]
2_bert.embeddings.Embedding_token_type_embeddings [1024, 2]
3_bert.embeddings.FusedLayerNorm_LayerNorm [1024]
4_bert.embeddings.Dropout_dropout -
5_bert.encoder.layer.0.attention.self.Linear_query [1024, 1024]
6_bert.encoder.layer.0.attention.self.Linear_key [1024, 1024]
7_bert.encoder.layer.0.attention.self.Linear_value [1024, 1024]
8_bert.encoder.layer.0.attention.self.Dropout_d... -
9_bert.encoder.layer.0.attention.output.Linear_... [1024, 1024]
10_bert.encoder.layer.0.attention.output.Dropou... -
11_bert.encoder.layer.0.attention.output.FusedL... [1024]
12_bert.encoder.layer.0.intermediate.Linear_dense [1024, 4096]
13_bert.encoder.layer.0.output.Linear_dense [4096, 1024]
14_bert.encoder.layer.0.output.Dropout_dropout -
15_bert.encoder.layer.0.output.FusedLayerNorm_L... [1024]
16_bert.encoder.layer.1.attention.self.Linear_q... [1024, 1024]
17_bert.encoder.layer.1.attention.self.Linear_key [1024, 1024]
18_bert.encoder.layer.1.attention.self.Linear_v... [1024, 1024]
19_bert.encoder.layer.1.attention.self.Dropout_... -
20_bert.encoder.layer.1.attention.output.Linear... [1024, 1024]
21_bert.encoder.layer.1.attention.output.Dropou... -
22_bert.encoder.layer.1.attention.output.FusedL... [1024]
23_bert.encoder.layer.1.intermediate.Linear_dense [1024, 4096]
24_bert.encoder.layer.1.output.Linear_dense [4096, 1024]
25_bert.encoder.layer.1.output.Dropout_dropout -
26_bert.encoder.layer.1.output.FusedLayerNorm_L... [1024]
27_bert.encoder.layer.2.attention.self.Linear_q... [1024, 1024]
28_bert.encoder.layer.2.attention.self.Linear_key [1024, 1024]
29_bert.encoder.layer.2.attention.self.Linear_v... [1024, 1024]
30_bert.encoder.layer.2.attention.self.Dropout_... -
31_bert.encoder.layer.2.attention.output.Linear... [1024, 1024]
32_bert.encoder.layer.2.attention.output.Dropou... -
33_bert.encoder.layer.2.attention.output.FusedL... [1024]
34_bert.encoder.layer.2.intermediate.Linear_dense [1024, 4096]
35_bert.encoder.layer.2.output.Linear_dense [4096, 1024]
36_bert.encoder.layer.2.output.Dropout_dropout -
37_bert.encoder.layer.2.output.FusedLayerNorm_L... [1024]
38_bert.encoder.layer.3.attention.self.Linear_q... [1024, 1024]
39_bert.encoder.layer.3.attention.self.Linear_key [1024, 1024]
40_bert.encoder.layer.3.attention.self.Linear_v... [1024, 1024]
41_bert.encoder.layer.3.attention.self.Dropout_... -
42_bert.encoder.layer.3.attention.output.Linear... [1024, 1024]
43_bert.encoder.layer.3.attention.output.Dropou... -
44_bert.encoder.layer.3.attention.output.FusedL... [1024]
45_bert.encoder.layer.3.intermediate.Linear_dense [1024, 4096]
46_bert.encoder.layer.3.output.Linear_dense [4096, 1024]
47_bert.encoder.layer.3.output.Dropout_dropout -
48_bert.encoder.layer.3.output.FusedLayerNorm_L... [1024]
49_bert.encoder.layer.4.attention.self.Linear_q... [1024, 1024]
50_bert.encoder.layer.4.attention.self.Linear_key [1024, 1024]
51_bert.encoder.layer.4.attention.self.Linear_v... [1024, 1024]
52_bert.encoder.layer.4.attention.self.Dropout_... -
53_bert.encoder.layer.4.attention.output.Linear... [1024, 1024]
54_bert.encoder.layer.4.attention.output.Dropou... -
55_bert.encoder.layer.4.attention.output.FusedL... [1024]
56_bert.encoder.layer.4.intermediate.Linear_dense [1024, 4096]
57_bert.encoder.layer.4.output.Linear_dense [4096, 1024]
58_bert.encoder.layer.4.output.Dropout_dropout -
59_bert.encoder.layer.4.output.FusedLayerNorm_L... [1024]
60_bert.encoder.layer.5.attention.self.Linear_q... [1024, 1024]
61_bert.encoder.layer.5.attention.self.Linear_key [1024, 1024]
62_bert.encoder.layer.5.attention.self.Linear_v... [1024, 1024]
63_bert.encoder.layer.5.attention.self.Dropout_... -
64_bert.encoder.layer.5.attention.output.Linear... [1024, 1024]
65_bert.encoder.layer.5.attention.output.Dropou... -
66_bert.encoder.layer.5.attention.output.FusedL... [1024]
67_bert.encoder.layer.5.intermediate.Linear_dense [1024, 4096]
68_bert.encoder.layer.5.output.Linear_dense [4096, 1024]
69_bert.encoder.layer.5.output.Dropout_dropout -
70_bert.encoder.layer.5.output.FusedLayerNorm_L... [1024]
71_bert.encoder.layer.6.attention.self.Linear_q... [1024, 1024]
72_bert.encoder.layer.6.attention.self.Linear_key [1024, 1024]
73_bert.encoder.layer.6.attention.self.Linear_v... [1024, 1024]
74_bert.encoder.layer.6.attention.self.Dropout_... -
75_bert.encoder.layer.6.attention.output.Linear... [1024, 1024]
76_bert.encoder.layer.6.attention.output.Dropou... -
77_bert.encoder.layer.6.attention.output.FusedL... [1024]
78_bert.encoder.layer.6.intermediate.Linear_dense [1024, 4096]
79_bert.encoder.layer.6.output.Linear_dense [4096, 1024]
80_bert.encoder.layer.6.output.Dropout_dropout -
81_bert.encoder.layer.6.output.FusedLayerNorm_L... [1024]
82_bert.encoder.layer.7.attention.self.Linear_q... [1024, 1024]
83_bert.encoder.layer.7.attention.self.Linear_key [1024, 1024]
84_bert.encoder.layer.7.attention.self.Linear_v... [1024, 1024]
85_bert.encoder.layer.7.attention.self.Dropout_... -
86_bert.encoder.layer.7.attention.output.Linear... [1024, 1024]
87_bert.encoder.layer.7.attention.output.Dropou... -
88_bert.encoder.layer.7.attention.output.FusedL... [1024]
89_bert.encoder.layer.7.intermediate.Linear_dense [1024, 4096]
90_bert.encoder.layer.7.output.Linear_dense [4096, 1024]
91_bert.encoder.layer.7.output.Dropout_dropout -
92_bert.encoder.layer.7.output.FusedLayerNorm_L... [1024]
93_bert.encoder.layer.8.attention.self.Linear_q... [1024, 1024]
94_bert.encoder.layer.8.attention.self.Linear_key [1024, 1024]
95_bert.encoder.layer.8.attention.self.Linear_v... [1024, 1024]
96_bert.encoder.layer.8.attention.self.Dropout_... -
97_bert.encoder.layer.8.attention.output.Linear... [1024, 1024]
98_bert.encoder.layer.8.attention.output.Dropou... -
99_bert.encoder.layer.8.attention.output.FusedL... [1024]
100_bert.encoder.layer.8.intermediate.Linear_dense [1024, 4096]
101_bert.encoder.layer.8.output.Linear_dense [4096, 1024]
102_bert.encoder.layer.8.output.Dropout_dropout -
103_bert.encoder.layer.8.output.FusedLayerNorm_... [1024]
104_bert.encoder.layer.9.attention.self.Linear_... [1024, 1024]
105_bert.encoder.layer.9.attention.self.Linear_key [1024, 1024]
106_bert.encoder.layer.9.attention.self.Linear_... [1024, 1024]
107_bert.encoder.layer.9.attention.self.Dropout... -
108_bert.encoder.layer.9.attention.output.Linea... [1024, 1024]
109_bert.encoder.layer.9.attention.output.Dropo... -
110_bert.encoder.layer.9.attention.output.Fused... [1024]
111_bert.encoder.layer.9.intermediate.Linear_dense [1024, 4096]
112_bert.encoder.layer.9.output.Linear_dense [4096, 1024]
113_bert.encoder.layer.9.output.Dropout_dropout -
114_bert.encoder.layer.9.output.FusedLayerNorm_... [1024]
115_bert.encoder.layer.10.attention.self.Linear... [1024, 1024]
116_bert.encoder.layer.10.attention.self.Linear... [1024, 1024]
117_bert.encoder.layer.10.attention.self.Linear... [1024, 1024]
118_bert.encoder.layer.10.attention.self.Dropou... -
119_bert.encoder.layer.10.attention.output.Line... [1024, 1024]
120_bert.encoder.layer.10.attention.output.Drop... -
121_bert.encoder.layer.10.attention.output.Fuse... [1024]
122_bert.encoder.layer.10.intermediate.Linear_d... [1024, 4096]
123_bert.encoder.layer.10.output.Linear_dense [4096, 1024]
124_bert.encoder.layer.10.output.Dropout_dropout -
125_bert.encoder.layer.10.output.FusedLayerNorm... [1024]
126_bert.encoder.layer.11.attention.self.Linear... [1024, 1024]
127_bert.encoder.layer.11.attention.self.Linear... [1024, 1024]
128_bert.encoder.layer.11.attention.self.Linear... [1024, 1024]
129_bert.encoder.layer.11.attention.self.Dropou... -
130_bert.encoder.layer.11.attention.output.Line... [1024, 1024]
131_bert.encoder.layer.11.attention.output.Drop... -
132_bert.encoder.layer.11.attention.output.Fuse... [1024]
133_bert.encoder.layer.11.intermediate.Linear_d... [1024, 4096]
134_bert.encoder.layer.11.output.Linear_dense [4096, 1024]
135_bert.encoder.layer.11.output.Dropout_dropout -
136_bert.encoder.layer.11.output.FusedLayerNorm... [1024]
137_bert.encoder.layer.12.attention.self.Linear... [1024, 1024]
138_bert.encoder.layer.12.attention.self.Linear... [1024, 1024]
139_bert.encoder.layer.12.attention.self.Linear... [1024, 1024]
140_bert.encoder.layer.12.attention.self.Dropou... -
141_bert.encoder.layer.12.attention.output.Line... [1024, 1024]
142_bert.encoder.layer.12.attention.output.Drop... -
143_bert.encoder.layer.12.attention.output.Fuse... [1024]
144_bert.encoder.layer.12.intermediate.Linear_d... [1024, 4096]
145_bert.encoder.layer.12.output.Linear_dense [4096, 1024]
146_bert.encoder.layer.12.output.Dropout_dropout -
147_bert.encoder.layer.12.output.FusedLayerNorm... [1024]
148_bert.encoder.layer.13.attention.self.Linear... [1024, 1024]
149_bert.encoder.layer.13.attention.self.Linear... [1024, 1024]
150_bert.encoder.layer.13.attention.self.Linear... [1024, 1024]
151_bert.encoder.layer.13.attention.self.Dropou... -
152_bert.encoder.layer.13.attention.output.Line... [1024, 1024]
153_bert.encoder.layer.13.attention.output.Drop... -
154_bert.encoder.layer.13.attention.output.Fuse... [1024]
155_bert.encoder.layer.13.intermediate.Linear_d... [1024, 4096]
156_bert.encoder.layer.13.output.Linear_dense [4096, 1024]
157_bert.encoder.layer.13.output.Dropout_dropout -
158_bert.encoder.layer.13.output.FusedLayerNorm... [1024]
159_bert.encoder.layer.14.attention.self.Linear... [1024, 1024]
160_bert.encoder.layer.14.attention.self.Linear... [1024, 1024]
161_bert.encoder.layer.14.attention.self.Linear... [1024, 1024]
162_bert.encoder.layer.14.attention.self.Dropou... -
163_bert.encoder.layer.14.attention.output.Line... [1024, 1024]
164_bert.encoder.layer.14.attention.output.Drop... -
165_bert.encoder.layer.14.attention.output.Fuse... [1024]
166_bert.encoder.layer.14.intermediate.Linear_d... [1024, 4096]
167_bert.encoder.layer.14.output.Linear_dense [4096, 1024]
168_bert.encoder.layer.14.output.Dropout_dropout -
169_bert.encoder.layer.14.output.FusedLayerNorm... [1024]
170_bert.encoder.layer.15.attention.self.Linear... [1024, 1024]
171_bert.encoder.layer.15.attention.self.Linear... [1024, 1024]
172_bert.encoder.layer.15.attention.self.Linear... [1024, 1024]
173_bert.encoder.layer.15.attention.self.Dropou... -
174_bert.encoder.layer.15.attention.output.Line... [1024, 1024]
175_bert.encoder.layer.15.attention.output.Drop... -
176_bert.encoder.layer.15.attention.output.Fuse... [1024]
177_bert.encoder.layer.15.intermediate.Linear_d... [1024, 4096]
178_bert.encoder.layer.15.output.Linear_dense [4096, 1024]
179_bert.encoder.layer.15.output.Dropout_dropout -
180_bert.encoder.layer.15.output.FusedLayerNorm... [1024]
181_bert.encoder.layer.16.attention.self.Linear... [1024, 1024]
182_bert.encoder.layer.16.attention.self.Linear... [1024, 1024]
183_bert.encoder.layer.16.attention.self.Linear... [1024, 1024]
184_bert.encoder.layer.16.attention.self.Dropou... -
185_bert.encoder.layer.16.attention.output.Line... [1024, 1024]
186_bert.encoder.layer.16.attention.output.Drop... -
187_bert.encoder.layer.16.attention.output.Fuse... [1024]
188_bert.encoder.layer.16.intermediate.Linear_d... [1024, 4096]
189_bert.encoder.layer.16.output.Linear_dense [4096, 1024]
190_bert.encoder.layer.16.output.Dropout_dropout -
191_bert.encoder.layer.16.output.FusedLayerNorm... [1024]
192_bert.encoder.layer.17.attention.self.Linear... [1024, 1024]
193_bert.encoder.layer.17.attention.self.Linear... [1024, 1024]
194_bert.encoder.layer.17.attention.self.Linear... [1024, 1024]
195_bert.encoder.layer.17.attention.self.Dropou... -
196_bert.encoder.layer.17.attention.output.Line... [1024, 1024]
197_bert.encoder.layer.17.attention.output.Drop... -
198_bert.encoder.layer.17.attention.output.Fuse... [1024]
199_bert.encoder.layer.17.intermediate.Linear_d... [1024, 4096]
200_bert.encoder.layer.17.output.Linear_dense [4096, 1024]
201_bert.encoder.layer.17.output.Dropout_dropout -
202_bert.encoder.layer.17.output.FusedLayerNorm... [1024]
203_bert.encoder.layer.18.attention.self.Linear... [1024, 1024]
204_bert.encoder.layer.18.attention.self.Linear... [1024, 1024]
205_bert.encoder.layer.18.attention.self.Linear... [1024, 1024]
206_bert.encoder.layer.18.attention.self.Dropou... -
207_bert.encoder.layer.18.attention.output.Line... [1024, 1024]
208_bert.encoder.layer.18.attention.output.Drop... -
209_bert.encoder.layer.18.attention.output.Fuse... [1024]
210_bert.encoder.layer.18.intermediate.Linear_d... [1024, 4096]
211_bert.encoder.layer.18.output.Linear_dense [4096, 1024]
212_bert.encoder.layer.18.output.Dropout_dropout -
213_bert.encoder.layer.18.output.FusedLayerNorm... [1024]
214_bert.encoder.layer.19.attention.self.Linear... [1024, 1024]
215_bert.encoder.layer.19.attention.self.Linear... [1024, 1024]
216_bert.encoder.layer.19.attention.self.Linear... [1024, 1024]
217_bert.encoder.layer.19.attention.self.Dropou... -
218_bert.encoder.layer.19.attention.output.Line... [1024, 1024]
219_bert.encoder.layer.19.attention.output.Drop... -
220_bert.encoder.layer.19.attention.output.Fuse... [1024]
221_bert.encoder.layer.19.intermediate.Linear_d... [1024, 4096]
222_bert.encoder.layer.19.output.Linear_dense [4096, 1024]
223_bert.encoder.layer.19.output.Dropout_dropout -
224_bert.encoder.layer.19.output.FusedLayerNorm... [1024]
225_bert.encoder.layer.20.attention.self.Linear... [1024, 1024]
226_bert.encoder.layer.20.attention.self.Linear... [1024, 1024]
227_bert.encoder.layer.20.attention.self.Linear... [1024, 1024]
228_bert.encoder.layer.20.attention.self.Dropou... -
229_bert.encoder.layer.20.attention.output.Line... [1024, 1024]
230_bert.encoder.layer.20.attention.output.Drop... -
231_bert.encoder.layer.20.attention.output.Fuse... [1024]
232_bert.encoder.layer.20.intermediate.Linear_d... [1024, 4096]
233_bert.encoder.layer.20.output.Linear_dense [4096, 1024]
234_bert.encoder.layer.20.output.Dropout_dropout -
235_bert.encoder.layer.20.output.FusedLayerNorm... [1024]
236_bert.encoder.layer.21.attention.self.Linear... [1024, 1024]
237_bert.encoder.layer.21.attention.self.Linear... [1024, 1024]
238_bert.encoder.layer.21.attention.self.Linear... [1024, 1024]
239_bert.encoder.layer.21.attention.self.Dropou... -
240_bert.encoder.layer.21.attention.output.Line... [1024, 1024]
241_bert.encoder.layer.21.attention.output.Drop... -
242_bert.encoder.layer.21.attention.output.Fuse... [1024]
243_bert.encoder.layer.21.intermediate.Linear_d... [1024, 4096]
244_bert.encoder.layer.21.output.Linear_dense [4096, 1024]
245_bert.encoder.layer.21.output.Dropout_dropout -
246_bert.encoder.layer.21.output.FusedLayerNorm... [1024]
247_bert.encoder.layer.22.attention.self.Linear... [1024, 1024]
248_bert.encoder.layer.22.attention.self.Linear... [1024, 1024]
249_bert.encoder.layer.22.attention.self.Linear... [1024, 1024]
250_bert.encoder.layer.22.attention.self.Dropou... -
251_bert.encoder.layer.22.attention.output.Line... [1024, 1024]
252_bert.encoder.layer.22.attention.output.Drop... -
253_bert.encoder.layer.22.attention.output.Fuse... [1024]
254_bert.encoder.layer.22.intermediate.Linear_d... [1024, 4096]
255_bert.encoder.layer.22.output.Linear_dense [4096, 1024]
256_bert.encoder.layer.22.output.Dropout_dropout -
257_bert.encoder.layer.22.output.FusedLayerNorm... [1024]
258_bert.encoder.layer.23.attention.self.Linear... [1024, 1024]
259_bert.encoder.layer.23.attention.self.Linear... [1024, 1024]
260_bert.encoder.layer.23.attention.self.Linear... [1024, 1024]
261_bert.encoder.layer.23.attention.self.Dropou... -
262_bert.encoder.layer.23.attention.output.Line... [1024, 1024]
263_bert.encoder.layer.23.attention.output.Drop... -
264_bert.encoder.layer.23.attention.output.Fuse... [1024]
265_bert.encoder.layer.23.intermediate.Linear_d... [1024, 4096]
266_bert.encoder.layer.23.output.Linear_dense [4096, 1024]
267_bert.encoder.layer.23.output.Dropout_dropout -
268_bert.encoder.layer.23.output.FusedLayerNorm... [1024]
269_bert.pooler.Linear_dense [1024, 1024]
270_bert.pooler.Tanh_activation -
271_dropout -
272_classifier [1024, 2]
Output Shape \
Layer
0_bert.embeddings.Embedding_word_embeddings [16, 256, 1024]
1_bert.embeddings.Embedding_position_embeddings [16, 256, 1024]
2_bert.embeddings.Embedding_token_type_embeddings [16, 256, 1024]
3_bert.embeddings.FusedLayerNorm_LayerNorm [16, 256, 1024]
4_bert.embeddings.Dropout_dropout [16, 256, 1024]
5_bert.encoder.layer.0.attention.self.Linear_query [16, 256, 1024]
6_bert.encoder.layer.0.attention.self.Linear_key [16, 256, 1024]
7_bert.encoder.layer.0.attention.self.Linear_value [16, 256, 1024]
8_bert.encoder.layer.0.attention.self.Dropout_d... [16, 16, 256, 256]
9_bert.encoder.layer.0.attention.output.Linear_... [16, 256, 1024]
10_bert.encoder.layer.0.attention.output.Dropou... [16, 256, 1024]
11_bert.encoder.layer.0.attention.output.FusedL... [16, 256, 1024]
12_bert.encoder.layer.0.intermediate.Linear_dense [16, 256, 4096]
13_bert.encoder.layer.0.output.Linear_dense [16, 256, 1024]
14_bert.encoder.layer.0.output.Dropout_dropout [16, 256, 1024]
15_bert.encoder.layer.0.output.FusedLayerNorm_L... [16, 256, 1024]
16_bert.encoder.layer.1.attention.self.Linear_q... [16, 256, 1024]
17_bert.encoder.layer.1.attention.self.Linear_key [16, 256, 1024]
18_bert.encoder.layer.1.attention.self.Linear_v... [16, 256, 1024]
19_bert.encoder.layer.1.attention.self.Dropout_... [16, 16, 256, 256]
20_bert.encoder.layer.1.attention.output.Linear... [16, 256, 1024]
21_bert.encoder.layer.1.attention.output.Dropou... [16, 256, 1024]
22_bert.encoder.layer.1.attention.output.FusedL... [16, 256, 1024]
23_bert.encoder.layer.1.intermediate.Linear_dense [16, 256, 4096]
24_bert.encoder.layer.1.output.Linear_dense [16, 256, 1024]
25_bert.encoder.layer.1.output.Dropout_dropout [16, 256, 1024]
26_bert.encoder.layer.1.output.FusedLayerNorm_L... [16, 256, 1024]
27_bert.encoder.layer.2.attention.self.Linear_q... [16, 256, 1024]
28_bert.encoder.layer.2.attention.self.Linear_key [16, 256, 1024]
29_bert.encoder.layer.2.attention.self.Linear_v... [16, 256, 1024]
30_bert.encoder.layer.2.attention.self.Dropout_... [16, 16, 256, 256]
31_bert.encoder.layer.2.attention.output.Linear... [16, 256, 1024]
32_bert.encoder.layer.2.attention.output.Dropou... [16, 256, 1024]
33_bert.encoder.layer.2.attention.output.FusedL... [16, 256, 1024]
34_bert.encoder.layer.2.intermediate.Linear_dense [16, 256, 4096]
35_bert.encoder.layer.2.output.Linear_dense [16, 256, 1024]
36_bert.encoder.layer.2.output.Dropout_dropout [16, 256, 1024]
37_bert.encoder.layer.2.output.FusedLayerNorm_L... [16, 256, 1024]
38_bert.encoder.layer.3.attention.self.Linear_q... [16, 256, 1024]
39_bert.encoder.layer.3.attention.self.Linear_key [16, 256, 1024]
40_bert.encoder.layer.3.attention.self.Linear_v... [16, 256, 1024]
41_bert.encoder.layer.3.attention.self.Dropout_... [16, 16, 256, 256]
42_bert.encoder.layer.3.attention.output.Linear... [16, 256, 1024]
43_bert.encoder.layer.3.attention.output.Dropou... [16, 256, 1024]
44_bert.encoder.layer.3.attention.output.FusedL... [16, 256, 1024]
45_bert.encoder.layer.3.intermediate.Linear_dense [16, 256, 4096]
46_bert.encoder.layer.3.output.Linear_dense [16, 256, 1024]
47_bert.encoder.layer.3.output.Dropout_dropout [16, 256, 1024]
48_bert.encoder.layer.3.output.FusedLayerNorm_L... [16, 256, 1024]
49_bert.encoder.layer.4.attention.self.Linear_q... [16, 256, 1024]
50_bert.encoder.layer.4.attention.self.Linear_key [16, 256, 1024]
51_bert.encoder.layer.4.attention.self.Linear_v... [16, 256, 1024]
52_bert.encoder.layer.4.attention.self.Dropout_... [16, 16, 256, 256]
53_bert.encoder.layer.4.attention.output.Linear... [16, 256, 1024]
54_bert.encoder.layer.4.attention.output.Dropou... [16, 256, 1024]
55_bert.encoder.layer.4.attention.output.FusedL... [16, 256, 1024]
56_bert.encoder.layer.4.intermediate.Linear_dense [16, 256, 4096]
57_bert.encoder.layer.4.output.Linear_dense [16, 256, 1024]
58_bert.encoder.layer.4.output.Dropout_dropout [16, 256, 1024]
59_bert.encoder.layer.4.output.FusedLayerNorm_L... [16, 256, 1024]
60_bert.encoder.layer.5.attention.self.Linear_q... [16, 256, 1024]
61_bert.encoder.layer.5.attention.self.Linear_key [16, 256, 1024]
62_bert.encoder.layer.5.attention.self.Linear_v... [16, 256, 1024]
63_bert.encoder.layer.5.attention.self.Dropout_... [16, 16, 256, 256]
64_bert.encoder.layer.5.attention.output.Linear... [16, 256, 1024]
65_bert.encoder.layer.5.attention.output.Dropou... [16, 256, 1024]
66_bert.encoder.layer.5.attention.output.FusedL... [16, 256, 1024]
67_bert.encoder.layer.5.intermediate.Linear_dense [16, 256, 4096]
68_bert.encoder.layer.5.output.Linear_dense [16, 256, 1024]
69_bert.encoder.layer.5.output.Dropout_dropout [16, 256, 1024]
70_bert.encoder.layer.5.output.FusedLayerNorm_L... [16, 256, 1024]
71_bert.encoder.layer.6.attention.self.Linear_q... [16, 256, 1024]
72_bert.encoder.layer.6.attention.self.Linear_key [16, 256, 1024]
73_bert.encoder.layer.6.attention.self.Linear_v... [16, 256, 1024]
74_bert.encoder.layer.6.attention.self.Dropout_... [16, 16, 256, 256]
75_bert.encoder.layer.6.attention.output.Linear... [16, 256, 1024]
76_bert.encoder.layer.6.attention.output.Dropou... [16, 256, 1024]
77_bert.encoder.layer.6.attention.output.FusedL... [16, 256, 1024]
78_bert.encoder.layer.6.intermediate.Linear_dense [16, 256, 4096]
79_bert.encoder.layer.6.output.Linear_dense [16, 256, 1024]
80_bert.encoder.layer.6.output.Dropout_dropout [16, 256, 1024]
81_bert.encoder.layer.6.output.FusedLayerNorm_L... [16, 256, 1024]
82_bert.encoder.layer.7.attention.self.Linear_q... [16, 256, 1024]
83_bert.encoder.layer.7.attention.self.Linear_key [16, 256, 1024]
84_bert.encoder.layer.7.attention.self.Linear_v... [16, 256, 1024]
85_bert.encoder.layer.7.attention.self.Dropout_... [16, 16, 256, 256]
86_bert.encoder.layer.7.attention.output.Linear... [16, 256, 1024]
87_bert.encoder.layer.7.attention.output.Dropou... [16, 256, 1024]
88_bert.encoder.layer.7.attention.output.FusedL... [16, 256, 1024]
89_bert.encoder.layer.7.intermediate.Linear_dense [16, 256, 4096]
90_bert.encoder.layer.7.output.Linear_dense [16, 256, 1024]
91_bert.encoder.layer.7.output.Dropout_dropout [16, 256, 1024]
92_bert.encoder.layer.7.output.FusedLayerNorm_L... [16, 256, 1024]
93_bert.encoder.layer.8.attention.self.Linear_q... [16, 256, 1024]
94_bert.encoder.layer.8.attention.self.Linear_key [16, 256, 1024]
95_bert.encoder.layer.8.attention.self.Linear_v... [16, 256, 1024]
96_bert.encoder.layer.8.attention.self.Dropout_... [16, 16, 256, 256]
97_bert.encoder.layer.8.attention.output.Linear... [16, 256, 1024]
98_bert.encoder.layer.8.attention.output.Dropou... [16, 256, 1024]
99_bert.encoder.layer.8.attention.output.FusedL... [16, 256, 1024]
100_bert.encoder.layer.8.intermediate.Linear_dense [16, 256, 4096]
101_bert.encoder.layer.8.output.Linear_dense [16, 256, 1024]
102_bert.encoder.layer.8.output.Dropout_dropout [16, 256, 1024]
103_bert.encoder.layer.8.output.FusedLayerNorm_... [16, 256, 1024]
104_bert.encoder.layer.9.attention.self.Linear_... [16, 256, 1024]
105_bert.encoder.layer.9.attention.self.Linear_key [16, 256, 1024]
106_bert.encoder.layer.9.attention.self.Linear_... [16, 256, 1024]
107_bert.encoder.layer.9.attention.self.Dropout... [16, 16, 256, 256]
108_bert.encoder.layer.9.attention.output.Linea... [16, 256, 1024]
109_bert.encoder.layer.9.attention.output.Dropo... [16, 256, 1024]
110_bert.encoder.layer.9.attention.output.Fused... [16, 256, 1024]
111_bert.encoder.layer.9.intermediate.Linear_dense [16, 256, 4096]
112_bert.encoder.layer.9.output.Linear_dense [16, 256, 1024]
113_bert.encoder.layer.9.output.Dropout_dropout [16, 256, 1024]
114_bert.encoder.layer.9.output.FusedLayerNorm_... [16, 256, 1024]
115_bert.encoder.layer.10.attention.self.Linear... [16, 256, 1024]
116_bert.encoder.layer.10.attention.self.Linear... [16, 256, 1024]
117_bert.encoder.layer.10.attention.self.Linear... [16, 256, 1024]
118_bert.encoder.layer.10.attention.self.Dropou... [16, 16, 256, 256]
119_bert.encoder.layer.10.attention.output.Line... [16, 256, 1024]
120_bert.encoder.layer.10.attention.output.Drop... [16, 256, 1024]
121_bert.encoder.layer.10.attention.output.Fuse... [16, 256, 1024]
122_bert.encoder.layer.10.intermediate.Linear_d... [16, 256, 4096]
123_bert.encoder.layer.10.output.Linear_dense [16, 256, 1024]
124_bert.encoder.layer.10.output.Dropout_dropout [16, 256, 1024]
125_bert.encoder.layer.10.output.FusedLayerNorm... [16, 256, 1024]
126_bert.encoder.layer.11.attention.self.Linear... [16, 256, 1024]
127_bert.encoder.layer.11.attention.self.Linear... [16, 256, 1024]
128_bert.encoder.layer.11.attention.self.Linear... [16, 256, 1024]
129_bert.encoder.layer.11.attention.self.Dropou... [16, 16, 256, 256]
130_bert.encoder.layer.11.attention.output.Line... [16, 256, 1024]
131_bert.encoder.layer.11.attention.output.Drop... [16, 256, 1024]
132_bert.encoder.layer.11.attention.output.Fuse... [16, 256, 1024]
133_bert.encoder.layer.11.intermediate.Linear_d... [16, 256, 4096]
134_bert.encoder.layer.11.output.Linear_dense [16, 256, 1024]
135_bert.encoder.layer.11.output.Dropout_dropout [16, 256, 1024]
136_bert.encoder.layer.11.output.FusedLayerNorm... [16, 256, 1024]
137_bert.encoder.layer.12.attention.self.Linear... [16, 256, 1024]
138_bert.encoder.layer.12.attention.self.Linear... [16, 256, 1024]
139_bert.encoder.layer.12.attention.self.Linear... [16, 256, 1024]
140_bert.encoder.layer.12.attention.self.Dropou... [16, 16, 256, 256]
141_bert.encoder.layer.12.attention.output.Line... [16, 256, 1024]
142_bert.encoder.layer.12.attention.output.Drop... [16, 256, 1024]
143_bert.encoder.layer.12.attention.output.Fuse... [16, 256, 1024]
144_bert.encoder.layer.12.intermediate.Linear_d... [16, 256, 4096]
145_bert.encoder.layer.12.output.Linear_dense [16, 256, 1024]
146_bert.encoder.layer.12.output.Dropout_dropout [16, 256, 1024]
147_bert.encoder.layer.12.output.FusedLayerNorm... [16, 256, 1024]
148_bert.encoder.layer.13.attention.self.Linear... [16, 256, 1024]
149_bert.encoder.layer.13.attention.self.Linear... [16, 256, 1024]
150_bert.encoder.layer.13.attention.self.Linear... [16, 256, 1024]
151_bert.encoder.layer.13.attention.self.Dropou... [16, 16, 256, 256]
152_bert.encoder.layer.13.attention.output.Line... [16, 256, 1024]
153_bert.encoder.layer.13.attention.output.Drop... [16, 256, 1024]
154_bert.encoder.layer.13.attention.output.Fuse... [16, 256, 1024]
155_bert.encoder.layer.13.intermediate.Linear_d... [16, 256, 4096]
156_bert.encoder.layer.13.output.Linear_dense [16, 256, 1024]
157_bert.encoder.layer.13.output.Dropout_dropout [16, 256, 1024]
158_bert.encoder.layer.13.output.FusedLayerNorm... [16, 256, 1024]
159_bert.encoder.layer.14.attention.self.Linear... [16, 256, 1024]
160_bert.encoder.layer.14.attention.self.Linear... [16, 256, 1024]
161_bert.encoder.layer.14.attention.self.Linear... [16, 256, 1024]
162_bert.encoder.layer.14.attention.self.Dropou... [16, 16, 256, 256]
163_bert.encoder.layer.14.attention.output.Line... [16, 256, 1024]
164_bert.encoder.layer.14.attention.output.Drop... [16, 256, 1024]
165_bert.encoder.layer.14.attention.output.Fuse... [16, 256, 1024]
166_bert.encoder.layer.14.intermediate.Linear_d... [16, 256, 4096]
167_bert.encoder.layer.14.output.Linear_dense [16, 256, 1024]
168_bert.encoder.layer.14.output.Dropout_dropout [16, 256, 1024]
169_bert.encoder.layer.14.output.FusedLayerNorm... [16, 256, 1024]
170_bert.encoder.layer.15.attention.self.Linear... [16, 256, 1024]
171_bert.encoder.layer.15.attention.self.Linear... [16, 256, 1024]
172_bert.encoder.layer.15.attention.self.Linear... [16, 256, 1024]
173_bert.encoder.layer.15.attention.self.Dropou... [16, 16, 256, 256]
174_bert.encoder.layer.15.attention.output.Line... [16, 256, 1024]
175_bert.encoder.layer.15.attention.output.Drop... [16, 256, 1024]
176_bert.encoder.layer.15.attention.output.Fuse... [16, 256, 1024]
177_bert.encoder.layer.15.intermediate.Linear_d... [16, 256, 4096]
178_bert.encoder.layer.15.output.Linear_dense [16, 256, 1024]
179_bert.encoder.layer.15.output.Dropout_dropout [16, 256, 1024]
180_bert.encoder.layer.15.output.FusedLayerNorm... [16, 256, 1024]
181_bert.encoder.layer.16.attention.self.Linear... [16, 256, 1024]
182_bert.encoder.layer.16.attention.self.Linear... [16, 256, 1024]
183_bert.encoder.layer.16.attention.self.Linear... [16, 256, 1024]
184_bert.encoder.layer.16.attention.self.Dropou... [16, 16, 256, 256]
185_bert.encoder.layer.16.attention.output.Line... [16, 256, 1024]
186_bert.encoder.layer.16.attention.output.Drop... [16, 256, 1024]
187_bert.encoder.layer.16.attention.output.Fuse... [16, 256, 1024]
188_bert.encoder.layer.16.intermediate.Linear_d... [16, 256, 4096]
189_bert.encoder.layer.16.output.Linear_dense [16, 256, 1024]
190_bert.encoder.layer.16.output.Dropout_dropout [16, 256, 1024]
191_bert.encoder.layer.16.output.FusedLayerNorm... [16, 256, 1024]
192_bert.encoder.layer.17.attention.self.Linear... [16, 256, 1024]
193_bert.encoder.layer.17.attention.self.Linear... [16, 256, 1024]
194_bert.encoder.layer.17.attention.self.Linear... [16, 256, 1024]
195_bert.encoder.layer.17.attention.self.Dropou... [16, 16, 256, 256]
196_bert.encoder.layer.17.attention.output.Line... [16, 256, 1024]
197_bert.encoder.layer.17.attention.output.Drop... [16, 256, 1024]
198_bert.encoder.layer.17.attention.output.Fuse... [16, 256, 1024]
199_bert.encoder.layer.17.intermediate.Linear_d... [16, 256, 4096]
200_bert.encoder.layer.17.output.Linear_dense [16, 256, 1024]
201_bert.encoder.layer.17.output.Dropout_dropout [16, 256, 1024]
202_bert.encoder.layer.17.output.FusedLayerNorm... [16, 256, 1024]
203_bert.encoder.layer.18.attention.self.Linear... [16, 256, 1024]
204_bert.encoder.layer.18.attention.self.Linear... [16, 256, 1024]
205_bert.encoder.layer.18.attention.self.Linear... [16, 256, 1024]
206_bert.encoder.layer.18.attention.self.Dropou... [16, 16, 256, 256]
207_bert.encoder.layer.18.attention.output.Line... [16, 256, 1024]
208_bert.encoder.layer.18.attention.output.Drop... [16, 256, 1024]
209_bert.encoder.layer.18.attention.output.Fuse... [16, 256, 1024]
210_bert.encoder.layer.18.intermediate.Linear_d... [16, 256, 4096]
211_bert.encoder.layer.18.output.Linear_dense [16, 256, 1024]
212_bert.encoder.layer.18.output.Dropout_dropout [16, 256, 1024]
213_bert.encoder.layer.18.output.FusedLayerNorm... [16, 256, 1024]
214_bert.encoder.layer.19.attention.self.Linear... [16, 256, 1024]
215_bert.encoder.layer.19.attention.self.Linear... [16, 256, 1024]
216_bert.encoder.layer.19.attention.self.Linear... [16, 256, 1024]
217_bert.encoder.layer.19.attention.self.Dropou... [16, 16, 256, 256]
218_bert.encoder.layer.19.attention.output.Line... [16, 256, 1024]
219_bert.encoder.layer.19.attention.output.Drop... [16, 256, 1024]
220_bert.encoder.layer.19.attention.output.Fuse... [16, 256, 1024]
221_bert.encoder.layer.19.intermediate.Linear_d... [16, 256, 4096]
222_bert.encoder.layer.19.output.Linear_dense [16, 256, 1024]
223_bert.encoder.layer.19.output.Dropout_dropout [16, 256, 1024]
224_bert.encoder.layer.19.output.FusedLayerNorm... [16, 256, 1024]
225_bert.encoder.layer.20.attention.self.Linear... [16, 256, 1024]
226_bert.encoder.layer.20.attention.self.Linear... [16, 256, 1024]
227_bert.encoder.layer.20.attention.self.Linear... [16, 256, 1024]
228_bert.encoder.layer.20.attention.self.Dropou... [16, 16, 256, 256]
229_bert.encoder.layer.20.attention.output.Line... [16, 256, 1024]
230_bert.encoder.layer.20.attention.output.Drop... [16, 256, 1024]
231_bert.encoder.layer.20.attention.output.Fuse... [16, 256, 1024]
232_bert.encoder.layer.20.intermediate.Linear_d... [16, 256, 4096]
233_bert.encoder.layer.20.output.Linear_dense [16, 256, 1024]
234_bert.encoder.layer.20.output.Dropout_dropout [16, 256, 1024]
235_bert.encoder.layer.20.output.FusedLayerNorm... [16, 256, 1024]
236_bert.encoder.layer.21.attention.self.Linear... [16, 256, 1024]
237_bert.encoder.layer.21.attention.self.Linear... [16, 256, 1024]
238_bert.encoder.layer.21.attention.self.Linear... [16, 256, 1024]
239_bert.encoder.layer.21.attention.self.Dropou... [16, 16, 256, 256]
240_bert.encoder.layer.21.attention.output.Line... [16, 256, 1024]
241_bert.encoder.layer.21.attention.output.Drop... [16, 256, 1024]
242_bert.encoder.layer.21.attention.output.Fuse... [16, 256, 1024]
243_bert.encoder.layer.21.intermediate.Linear_d... [16, 256, 4096]
244_bert.encoder.layer.21.output.Linear_dense [16, 256, 1024]
245_bert.encoder.layer.21.output.Dropout_dropout [16, 256, 1024]
246_bert.encoder.layer.21.output.FusedLayerNorm... [16, 256, 1024]
247_bert.encoder.layer.22.attention.self.Linear... [16, 256, 1024]
248_bert.encoder.layer.22.attention.self.Linear... [16, 256, 1024]
249_bert.encoder.layer.22.attention.self.Linear... [16, 256, 1024]
250_bert.encoder.layer.22.attention.self.Dropou... [16, 16, 256, 256]
251_bert.encoder.layer.22.attention.output.Line... [16, 256, 1024]
252_bert.encoder.layer.22.attention.output.Drop... [16, 256, 1024]
253_bert.encoder.layer.22.attention.output.Fuse... [16, 256, 1024]
254_bert.encoder.layer.22.intermediate.Linear_d... [16, 256, 4096]
255_bert.encoder.layer.22.output.Linear_dense [16, 256, 1024]
256_bert.encoder.layer.22.output.Dropout_dropout [16, 256, 1024]
257_bert.encoder.layer.22.output.FusedLayerNorm... [16, 256, 1024]
258_bert.encoder.layer.23.attention.self.Linear... [16, 256, 1024]
259_bert.encoder.layer.23.attention.self.Linear... [16, 256, 1024]
260_bert.encoder.layer.23.attention.self.Linear... [16, 256, 1024]
261_bert.encoder.layer.23.attention.self.Dropou... [16, 16, 256, 256]
262_bert.encoder.layer.23.attention.output.Line... [16, 256, 1024]
263_bert.encoder.layer.23.attention.output.Drop... [16, 256, 1024]
264_bert.encoder.layer.23.attention.output.Fuse... [16, 256, 1024]
265_bert.encoder.layer.23.intermediate.Linear_d... [16, 256, 4096]
266_bert.encoder.layer.23.output.Linear_dense [16, 256, 1024]
267_bert.encoder.layer.23.output.Dropout_dropout [16, 256, 1024]
268_bert.encoder.layer.23.output.FusedLayerNorm... [16, 256, 1024]
269_bert.pooler.Linear_dense [16, 1024]
270_bert.pooler.Tanh_activation [16, 1024]
271_dropout [16, 1024]
272_classifier [16, 2]
Params Mult-Adds
Layer
0_bert.embeddings.Embedding_word_embeddings - -
1_bert.embeddings.Embedding_position_embeddings - -
2_bert.embeddings.Embedding_token_type_embeddings - -
3_bert.embeddings.FusedLayerNorm_LayerNorm - -
4_bert.embeddings.Dropout_dropout - -
5_bert.encoder.layer.0.attention.self.Linear_query - -
6_bert.encoder.layer.0.attention.self.Linear_key - -
7_bert.encoder.layer.0.attention.self.Linear_value - -
8_bert.encoder.layer.0.attention.self.Dropout_d... - -
9_bert.encoder.layer.0.attention.output.Linear_... - -
10_bert.encoder.layer.0.attention.output.Dropou... - -
11_bert.encoder.layer.0.attention.output.FusedL... - -
12_bert.encoder.layer.0.intermediate.Linear_dense - -
13_bert.encoder.layer.0.output.Linear_dense - -
14_bert.encoder.layer.0.output.Dropout_dropout - -
15_bert.encoder.layer.0.output.FusedLayerNorm_L... - -
16_bert.encoder.layer.1.attention.self.Linear_q... - -
17_bert.encoder.layer.1.attention.self.Linear_key - -
18_bert.encoder.layer.1.attention.self.Linear_v... - -
19_bert.encoder.layer.1.attention.self.Dropout_... - -
20_bert.encoder.layer.1.attention.output.Linear... - -
21_bert.encoder.layer.1.attention.output.Dropou... - -
22_bert.encoder.layer.1.attention.output.FusedL... - -
23_bert.encoder.layer.1.intermediate.Linear_dense - -
24_bert.encoder.layer.1.output.Linear_dense - -
25_bert.encoder.layer.1.output.Dropout_dropout - -
26_bert.encoder.layer.1.output.FusedLayerNorm_L... - -
27_bert.encoder.layer.2.attention.self.Linear_q... - -
28_bert.encoder.layer.2.attention.self.Linear_key - -
29_bert.encoder.layer.2.attention.self.Linear_v... - -
30_bert.encoder.layer.2.attention.self.Dropout_... - -
31_bert.encoder.layer.2.attention.output.Linear... - -
32_bert.encoder.layer.2.attention.output.Dropou... - -
33_bert.encoder.layer.2.attention.output.FusedL... - -
34_bert.encoder.layer.2.intermediate.Linear_dense - -
35_bert.encoder.layer.2.output.Linear_dense - -
36_bert.encoder.layer.2.output.Dropout_dropout - -
37_bert.encoder.layer.2.output.FusedLayerNorm_L... - -
38_bert.encoder.layer.3.attention.self.Linear_q... - -
39_bert.encoder.layer.3.attention.self.Linear_key - -
40_bert.encoder.layer.3.attention.self.Linear_v... - -
41_bert.encoder.layer.3.attention.self.Dropout_... - -
42_bert.encoder.layer.3.attention.output.Linear... - -
43_bert.encoder.layer.3.attention.output.Dropou... - -
44_bert.encoder.layer.3.attention.output.FusedL... - -
45_bert.encoder.layer.3.intermediate.Linear_dense - -
46_bert.encoder.layer.3.output.Linear_dense - -
47_bert.encoder.layer.3.output.Dropout_dropout - -
48_bert.encoder.layer.3.output.FusedLayerNorm_L... - -
49_bert.encoder.layer.4.attention.self.Linear_q... - -
50_bert.encoder.layer.4.attention.self.Linear_key - -
51_bert.encoder.layer.4.attention.self.Linear_v... - -
52_bert.encoder.layer.4.attention.self.Dropout_... - -
53_bert.encoder.layer.4.attention.output.Linear... - -
54_bert.encoder.layer.4.attention.output.Dropou... - -
55_bert.encoder.layer.4.attention.output.FusedL... - -
56_bert.encoder.layer.4.intermediate.Linear_dense - -
57_bert.encoder.layer.4.output.Linear_dense - -
58_bert.encoder.layer.4.output.Dropout_dropout - -
59_bert.encoder.layer.4.output.FusedLayerNorm_L... - -
60_bert.encoder.layer.5.attention.self.Linear_q... - -
61_bert.encoder.layer.5.attention.self.Linear_key - -
62_bert.encoder.layer.5.attention.self.Linear_v... - -
63_bert.encoder.layer.5.attention.self.Dropout_... - -
64_bert.encoder.layer.5.attention.output.Linear... - -
65_bert.encoder.layer.5.attention.output.Dropou... - -
66_bert.encoder.layer.5.attention.output.FusedL... - -
67_bert.encoder.layer.5.intermediate.Linear_dense - -
68_bert.encoder.layer.5.output.Linear_dense - -
69_bert.encoder.layer.5.output.Dropout_dropout - -
70_bert.encoder.layer.5.output.FusedLayerNorm_L... - -
71_bert.encoder.layer.6.attention.self.Linear_q... - -
72_bert.encoder.layer.6.attention.self.Linear_key - -
73_bert.encoder.layer.6.attention.self.Linear_v... - -
74_bert.encoder.layer.6.attention.self.Dropout_... - -
75_bert.encoder.layer.6.attention.output.Linear... - -
76_bert.encoder.layer.6.attention.output.Dropou... - -
77_bert.encoder.layer.6.attention.output.FusedL... - -
78_bert.encoder.layer.6.intermediate.Linear_dense - -
79_bert.encoder.layer.6.output.Linear_dense - -
80_bert.encoder.layer.6.output.Dropout_dropout - -
81_bert.encoder.layer.6.output.FusedLayerNorm_L... - -
82_bert.encoder.layer.7.attention.self.Linear_q... - -
83_bert.encoder.layer.7.attention.self.Linear_key - -
84_bert.encoder.layer.7.attention.self.Linear_v... - -
85_bert.encoder.layer.7.attention.self.Dropout_... - -
86_bert.encoder.layer.7.attention.output.Linear... - -
87_bert.encoder.layer.7.attention.output.Dropou... - -
88_bert.encoder.layer.7.attention.output.FusedL... - -
89_bert.encoder.layer.7.intermediate.Linear_dense - -
90_bert.encoder.layer.7.output.Linear_dense - -
91_bert.encoder.layer.7.output.Dropout_dropout - -
92_bert.encoder.layer.7.output.FusedLayerNorm_L... - -
93_bert.encoder.layer.8.attention.self.Linear_q... - -
94_bert.encoder.layer.8.attention.self.Linear_key - -
95_bert.encoder.layer.8.attention.self.Linear_v... - -
96_bert.encoder.layer.8.attention.self.Dropout_... - -
97_bert.encoder.layer.8.attention.output.Linear... - -
98_bert.encoder.layer.8.attention.output.Dropou... - -
99_bert.encoder.layer.8.attention.output.FusedL... - -
100_bert.encoder.layer.8.intermediate.Linear_dense - -
101_bert.encoder.layer.8.output.Linear_dense - -
102_bert.encoder.layer.8.output.Dropout_dropout - -
103_bert.encoder.layer.8.output.FusedLayerNorm_... - -
104_bert.encoder.layer.9.attention.self.Linear_... - -
105_bert.encoder.layer.9.attention.self.Linear_key - -
106_bert.encoder.layer.9.attention.self.Linear_... - -
107_bert.encoder.layer.9.attention.self.Dropout... - -
108_bert.encoder.layer.9.attention.output.Linea... - -
109_bert.encoder.layer.9.attention.output.Dropo... - -
110_bert.encoder.layer.9.attention.output.Fused... - -
111_bert.encoder.layer.9.intermediate.Linear_dense - -
112_bert.encoder.layer.9.output.Linear_dense - -
113_bert.encoder.layer.9.output.Dropout_dropout - -
114_bert.encoder.layer.9.output.FusedLayerNorm_... - -
115_bert.encoder.layer.10.attention.self.Linear... - -
116_bert.encoder.layer.10.attention.self.Linear... - -
117_bert.encoder.layer.10.attention.self.Linear... - -
118_bert.encoder.layer.10.attention.self.Dropou... - -
119_bert.encoder.layer.10.attention.output.Line... - -
120_bert.encoder.layer.10.attention.output.Drop... - -
121_bert.encoder.layer.10.attention.output.Fuse... - -
122_bert.encoder.layer.10.intermediate.Linear_d... - -
123_bert.encoder.layer.10.output.Linear_dense - -
124_bert.encoder.layer.10.output.Dropout_dropout - -
125_bert.encoder.layer.10.output.FusedLayerNorm... - -
126_bert.encoder.layer.11.attention.self.Linear... - -
127_bert.encoder.layer.11.attention.self.Linear... - -
128_bert.encoder.layer.11.attention.self.Linear... - -
129_bert.encoder.layer.11.attention.self.Dropou... - -
130_bert.encoder.layer.11.attention.output.Line... - -
131_bert.encoder.layer.11.attention.output.Drop... - -
132_bert.encoder.layer.11.attention.output.Fuse... - -
133_bert.encoder.layer.11.intermediate.Linear_d... - -
134_bert.encoder.layer.11.output.Linear_dense - -
135_bert.encoder.layer.11.output.Dropout_dropout - -
136_bert.encoder.layer.11.output.FusedLayerNorm... - -
137_bert.encoder.layer.12.attention.self.Linear... - -
138_bert.encoder.layer.12.attention.self.Linear... - -
139_bert.encoder.layer.12.attention.self.Linear... - -
140_bert.encoder.layer.12.attention.self.Dropou... - -
141_bert.encoder.layer.12.attention.output.Line... - -
142_bert.encoder.layer.12.attention.output.Drop... - -
143_bert.encoder.layer.12.attention.output.Fuse... - -
144_bert.encoder.layer.12.intermediate.Linear_d... - -
145_bert.encoder.layer.12.output.Linear_dense - -
146_bert.encoder.layer.12.output.Dropout_dropout - -
147_bert.encoder.layer.12.output.FusedLayerNorm... - -
148_bert.encoder.layer.13.attention.self.Linear... - -
149_bert.encoder.layer.13.attention.self.Linear... - -
150_bert.encoder.layer.13.attention.self.Linear... - -
151_bert.encoder.layer.13.attention.self.Dropou... - -
152_bert.encoder.layer.13.attention.output.Line... - -
153_bert.encoder.layer.13.attention.output.Drop... - -
154_bert.encoder.layer.13.attention.output.Fuse... - -
155_bert.encoder.layer.13.intermediate.Linear_d... - -
156_bert.encoder.layer.13.output.Linear_dense - -
157_bert.encoder.layer.13.output.Dropout_dropout - -
158_bert.encoder.layer.13.output.FusedLayerNorm... - -
159_bert.encoder.layer.14.attention.self.Linear... - -
160_bert.encoder.layer.14.attention.self.Linear... - -
161_bert.encoder.layer.14.attention.self.Linear... - -
162_bert.encoder.layer.14.attention.self.Dropou... - -
163_bert.encoder.layer.14.attention.output.Line... - -
164_bert.encoder.layer.14.attention.output.Drop... - -
165_bert.encoder.layer.14.attention.output.Fuse... - -
166_bert.encoder.layer.14.intermediate.Linear_d... - -
167_bert.encoder.layer.14.output.Linear_dense - -
168_bert.encoder.layer.14.output.Dropout_dropout - -
169_bert.encoder.layer.14.output.FusedLayerNorm... - -
170_bert.encoder.layer.15.attention.self.Linear... - -
171_bert.encoder.layer.15.attention.self.Linear... - -
172_bert.encoder.layer.15.attention.self.Linear... - -
173_bert.encoder.layer.15.attention.self.Dropou... - -
174_bert.encoder.layer.15.attention.output.Line... - -
175_bert.encoder.layer.15.attention.output.Drop... - -
176_bert.encoder.layer.15.attention.output.Fuse... - -
177_bert.encoder.layer.15.intermediate.Linear_d... - -
178_bert.encoder.layer.15.output.Linear_dense - -
179_bert.encoder.layer.15.output.Dropout_dropout - -
180_bert.encoder.layer.15.output.FusedLayerNorm... - -
181_bert.encoder.layer.16.attention.self.Linear... - -
182_bert.encoder.layer.16.attention.self.Linear... - -
183_bert.encoder.layer.16.attention.self.Linear... - -
184_bert.encoder.layer.16.attention.self.Dropou... - -
185_bert.encoder.layer.16.attention.output.Line... - -
186_bert.encoder.layer.16.attention.output.Drop... - -
187_bert.encoder.layer.16.attention.output.Fuse... - -
188_bert.encoder.layer.16.intermediate.Linear_d... - -
189_bert.encoder.layer.16.output.Linear_dense - -
190_bert.encoder.layer.16.output.Dropout_dropout - -
191_bert.encoder.layer.16.output.FusedLayerNorm... - -
192_bert.encoder.layer.17.attention.self.Linear... - -
193_bert.encoder.layer.17.attention.self.Linear... - -
194_bert.encoder.layer.17.attention.self.Linear... - -
195_bert.encoder.layer.17.attention.self.Dropou... - -
196_bert.encoder.layer.17.attention.output.Line... - -
197_bert.encoder.layer.17.attention.output.Drop... - -
198_bert.encoder.layer.17.attention.output.Fuse... - -
199_bert.encoder.layer.17.intermediate.Linear_d... - -
200_bert.encoder.layer.17.output.Linear_dense - -
201_bert.encoder.layer.17.output.Dropout_dropout - -
202_bert.encoder.layer.17.output.FusedLayerNorm... - -
203_bert.encoder.layer.18.attention.self.Linear... - -
204_bert.encoder.layer.18.attention.self.Linear... - -
205_bert.encoder.layer.18.attention.self.Linear... - -
206_bert.encoder.layer.18.attention.self.Dropou... - -
207_bert.encoder.layer.18.attention.output.Line... - -
208_bert.encoder.layer.18.attention.output.Drop... - -
209_bert.encoder.layer.18.attention.output.Fuse... - -
210_bert.encoder.layer.18.intermediate.Linear_d... - -
211_bert.encoder.layer.18.output.Linear_dense - -
212_bert.encoder.layer.18.output.Dropout_dropout - -
213_bert.encoder.layer.18.output.FusedLayerNorm... - -
214_bert.encoder.layer.19.attention.self.Linear... - -
215_bert.encoder.layer.19.attention.self.Linear... - -
216_bert.encoder.layer.19.attention.self.Linear... - -
217_bert.encoder.layer.19.attention.self.Dropou... - -
218_bert.encoder.layer.19.attention.output.Line... - -
219_bert.encoder.layer.19.attention.output.Drop... - -
220_bert.encoder.layer.19.attention.output.Fuse... - -
221_bert.encoder.layer.19.intermediate.Linear_d... - -
222_bert.encoder.layer.19.output.Linear_dense - -
223_bert.encoder.layer.19.output.Dropout_dropout - -
224_bert.encoder.layer.19.output.FusedLayerNorm... - -
225_bert.encoder.layer.20.attention.self.Linear... 1.0496M 1.048576M
226_bert.encoder.layer.20.attention.self.Linear... 1.0496M 1.048576M
227_bert.encoder.layer.20.attention.self.Linear... 1.0496M 1.048576M
228_bert.encoder.layer.20.attention.self.Dropou... - -
229_bert.encoder.layer.20.attention.output.Line... 1.0496M 1.048576M
230_bert.encoder.layer.20.attention.output.Drop... - -
231_bert.encoder.layer.20.attention.output.Fuse... 2.048k 1.024k
232_bert.encoder.layer.20.intermediate.Linear_d... 4.1984M 4.194304M
233_bert.encoder.layer.20.output.Linear_dense 4.195328M 4.194304M
234_bert.encoder.layer.20.output.Dropout_dropout - -
235_bert.encoder.layer.20.output.FusedLayerNorm... 2.048k 1.024k
236_bert.encoder.layer.21.attention.self.Linear... 1.0496M 1.048576M
237_bert.encoder.layer.21.attention.self.Linear... 1.0496M 1.048576M
238_bert.encoder.layer.21.attention.self.Linear... 1.0496M 1.048576M
239_bert.encoder.layer.21.attention.self.Dropou... - -
240_bert.encoder.layer.21.attention.output.Line... 1.0496M 1.048576M
241_bert.encoder.layer.21.attention.output.Drop... - -
242_bert.encoder.layer.21.attention.output.Fuse... 2.048k 1.024k
243_bert.encoder.layer.21.intermediate.Linear_d... 4.1984M 4.194304M
244_bert.encoder.layer.21.output.Linear_dense 4.195328M 4.194304M
245_bert.encoder.layer.21.output.Dropout_dropout - -
246_bert.encoder.layer.21.output.FusedLayerNorm... 2.048k 1.024k
247_bert.encoder.layer.22.attention.self.Linear... 1.0496M 1.048576M
248_bert.encoder.layer.22.attention.self.Linear... 1.0496M 1.048576M
249_bert.encoder.layer.22.attention.self.Linear... 1.0496M 1.048576M
250_bert.encoder.layer.22.attention.self.Dropou... - -
251_bert.encoder.layer.22.attention.output.Line... 1.0496M 1.048576M
252_bert.encoder.layer.22.attention.output.Drop... - -
253_bert.encoder.layer.22.attention.output.Fuse... 2.048k 1.024k
254_bert.encoder.layer.22.intermediate.Linear_d... 4.1984M 4.194304M
255_bert.encoder.layer.22.output.Linear_dense 4.195328M 4.194304M
256_bert.encoder.layer.22.output.Dropout_dropout - -
257_bert.encoder.layer.22.output.FusedLayerNorm... 2.048k 1.024k
258_bert.encoder.layer.23.attention.self.Linear... 1.0496M 1.048576M
259_bert.encoder.layer.23.attention.self.Linear... 1.0496M 1.048576M
260_bert.encoder.layer.23.attention.self.Linear... 1.0496M 1.048576M
261_bert.encoder.layer.23.attention.self.Dropou... - -
262_bert.encoder.layer.23.attention.output.Line... 1.0496M 1.048576M
263_bert.encoder.layer.23.attention.output.Drop... - -
264_bert.encoder.layer.23.attention.output.Fuse... 2.048k 1.024k
265_bert.encoder.layer.23.intermediate.Linear_d... 4.1984M 4.194304M
266_bert.encoder.layer.23.output.Linear_dense 4.195328M 4.194304M
267_bert.encoder.layer.23.output.Dropout_dropout - -
268_bert.encoder.layer.23.output.FusedLayerNorm... 2.048k 1.024k
269_bert.pooler.Linear_dense 1.0496M 1.048576M
270_bert.pooler.Tanh_activation - -
271_dropout - -
272_classifier 2.05k 2.048k
-----------------------------------------------------------------------------------------------------------
Totals
Total params 333.581314M
Trainable params 51.436546M
Non-trainable params 282.144768M
Mult-Adds 51.390464M
===========================================================================================================
Yes, I was planning to modify to show both total and non-trainable params.
Is this the right approach?, perhaps it would be better to show trainable vs nontrainable parameters. Or sill use nontrainable parameters to estimate macs