nmhkahn / torchsummaryX

torchsummaryX: Improved visualization tool of torchsummary
302 stars 32 forks source link

ignore parameters with no gradient #7

Closed wassname closed 5 years ago

wassname commented 5 years ago

Is this the right approach?, perhaps it would be better to show trainable vs nontrainable parameters. Or sill use nontrainable parameters to estimate macs

wassname commented 5 years ago

Result looks like this:

===========================================================================================================
                                                     Kernel Shape  \
Layer                                                               
0_bert.embeddings.Embedding_word_embeddings         [1024, 28996]   
1_bert.embeddings.Embedding_position_embeddings       [1024, 512]   
2_bert.embeddings.Embedding_token_type_embeddings       [1024, 2]   
3_bert.embeddings.FusedLayerNorm_LayerNorm                 [1024]   
4_bert.embeddings.Dropout_dropout                               -   
5_bert.encoder.layer.0.attention.self.Linear_query   [1024, 1024]   
6_bert.encoder.layer.0.attention.self.Linear_key     [1024, 1024]   
7_bert.encoder.layer.0.attention.self.Linear_value   [1024, 1024]   
8_bert.encoder.layer.0.attention.self.Dropout_d...              -   
9_bert.encoder.layer.0.attention.output.Linear_...   [1024, 1024]   
10_bert.encoder.layer.0.attention.output.Dropou...              -   
11_bert.encoder.layer.0.attention.output.FusedL...         [1024]   
12_bert.encoder.layer.0.intermediate.Linear_dense    [1024, 4096]   
13_bert.encoder.layer.0.output.Linear_dense          [4096, 1024]   
14_bert.encoder.layer.0.output.Dropout_dropout                  -   
15_bert.encoder.layer.0.output.FusedLayerNorm_L...         [1024]   
16_bert.encoder.layer.1.attention.self.Linear_q...   [1024, 1024]   
17_bert.encoder.layer.1.attention.self.Linear_key    [1024, 1024]   
18_bert.encoder.layer.1.attention.self.Linear_v...   [1024, 1024]   
19_bert.encoder.layer.1.attention.self.Dropout_...              -   
20_bert.encoder.layer.1.attention.output.Linear...   [1024, 1024]   
21_bert.encoder.layer.1.attention.output.Dropou...              -   
22_bert.encoder.layer.1.attention.output.FusedL...         [1024]   
23_bert.encoder.layer.1.intermediate.Linear_dense    [1024, 4096]   
24_bert.encoder.layer.1.output.Linear_dense          [4096, 1024]   
25_bert.encoder.layer.1.output.Dropout_dropout                  -   
26_bert.encoder.layer.1.output.FusedLayerNorm_L...         [1024]   
27_bert.encoder.layer.2.attention.self.Linear_q...   [1024, 1024]   
28_bert.encoder.layer.2.attention.self.Linear_key    [1024, 1024]   
29_bert.encoder.layer.2.attention.self.Linear_v...   [1024, 1024]   
30_bert.encoder.layer.2.attention.self.Dropout_...              -   
31_bert.encoder.layer.2.attention.output.Linear...   [1024, 1024]   
32_bert.encoder.layer.2.attention.output.Dropou...              -   
33_bert.encoder.layer.2.attention.output.FusedL...         [1024]   
34_bert.encoder.layer.2.intermediate.Linear_dense    [1024, 4096]   
35_bert.encoder.layer.2.output.Linear_dense          [4096, 1024]   
36_bert.encoder.layer.2.output.Dropout_dropout                  -   
37_bert.encoder.layer.2.output.FusedLayerNorm_L...         [1024]   
38_bert.encoder.layer.3.attention.self.Linear_q...   [1024, 1024]   
39_bert.encoder.layer.3.attention.self.Linear_key    [1024, 1024]   
40_bert.encoder.layer.3.attention.self.Linear_v...   [1024, 1024]   
41_bert.encoder.layer.3.attention.self.Dropout_...              -   
42_bert.encoder.layer.3.attention.output.Linear...   [1024, 1024]   
43_bert.encoder.layer.3.attention.output.Dropou...              -   
44_bert.encoder.layer.3.attention.output.FusedL...         [1024]   
45_bert.encoder.layer.3.intermediate.Linear_dense    [1024, 4096]   
46_bert.encoder.layer.3.output.Linear_dense          [4096, 1024]   
47_bert.encoder.layer.3.output.Dropout_dropout                  -   
48_bert.encoder.layer.3.output.FusedLayerNorm_L...         [1024]   
49_bert.encoder.layer.4.attention.self.Linear_q...   [1024, 1024]   
50_bert.encoder.layer.4.attention.self.Linear_key    [1024, 1024]   
51_bert.encoder.layer.4.attention.self.Linear_v...   [1024, 1024]   
52_bert.encoder.layer.4.attention.self.Dropout_...              -   
53_bert.encoder.layer.4.attention.output.Linear...   [1024, 1024]   
54_bert.encoder.layer.4.attention.output.Dropou...              -   
55_bert.encoder.layer.4.attention.output.FusedL...         [1024]   
56_bert.encoder.layer.4.intermediate.Linear_dense    [1024, 4096]   
57_bert.encoder.layer.4.output.Linear_dense          [4096, 1024]   
58_bert.encoder.layer.4.output.Dropout_dropout                  -   
59_bert.encoder.layer.4.output.FusedLayerNorm_L...         [1024]   
60_bert.encoder.layer.5.attention.self.Linear_q...   [1024, 1024]   
61_bert.encoder.layer.5.attention.self.Linear_key    [1024, 1024]   
62_bert.encoder.layer.5.attention.self.Linear_v...   [1024, 1024]   
63_bert.encoder.layer.5.attention.self.Dropout_...              -   
64_bert.encoder.layer.5.attention.output.Linear...   [1024, 1024]   
65_bert.encoder.layer.5.attention.output.Dropou...              -   
66_bert.encoder.layer.5.attention.output.FusedL...         [1024]   
67_bert.encoder.layer.5.intermediate.Linear_dense    [1024, 4096]   
68_bert.encoder.layer.5.output.Linear_dense          [4096, 1024]   
69_bert.encoder.layer.5.output.Dropout_dropout                  -   
70_bert.encoder.layer.5.output.FusedLayerNorm_L...         [1024]   
71_bert.encoder.layer.6.attention.self.Linear_q...   [1024, 1024]   
72_bert.encoder.layer.6.attention.self.Linear_key    [1024, 1024]   
73_bert.encoder.layer.6.attention.self.Linear_v...   [1024, 1024]   
74_bert.encoder.layer.6.attention.self.Dropout_...              -   
75_bert.encoder.layer.6.attention.output.Linear...   [1024, 1024]   
76_bert.encoder.layer.6.attention.output.Dropou...              -   
77_bert.encoder.layer.6.attention.output.FusedL...         [1024]   
78_bert.encoder.layer.6.intermediate.Linear_dense    [1024, 4096]   
79_bert.encoder.layer.6.output.Linear_dense          [4096, 1024]   
80_bert.encoder.layer.6.output.Dropout_dropout                  -   
81_bert.encoder.layer.6.output.FusedLayerNorm_L...         [1024]   
82_bert.encoder.layer.7.attention.self.Linear_q...   [1024, 1024]   
83_bert.encoder.layer.7.attention.self.Linear_key    [1024, 1024]   
84_bert.encoder.layer.7.attention.self.Linear_v...   [1024, 1024]   
85_bert.encoder.layer.7.attention.self.Dropout_...              -   
86_bert.encoder.layer.7.attention.output.Linear...   [1024, 1024]   
87_bert.encoder.layer.7.attention.output.Dropou...              -   
88_bert.encoder.layer.7.attention.output.FusedL...         [1024]   
89_bert.encoder.layer.7.intermediate.Linear_dense    [1024, 4096]   
90_bert.encoder.layer.7.output.Linear_dense          [4096, 1024]   
91_bert.encoder.layer.7.output.Dropout_dropout                  -   
92_bert.encoder.layer.7.output.FusedLayerNorm_L...         [1024]   
93_bert.encoder.layer.8.attention.self.Linear_q...   [1024, 1024]   
94_bert.encoder.layer.8.attention.self.Linear_key    [1024, 1024]   
95_bert.encoder.layer.8.attention.self.Linear_v...   [1024, 1024]   
96_bert.encoder.layer.8.attention.self.Dropout_...              -   
97_bert.encoder.layer.8.attention.output.Linear...   [1024, 1024]   
98_bert.encoder.layer.8.attention.output.Dropou...              -   
99_bert.encoder.layer.8.attention.output.FusedL...         [1024]   
100_bert.encoder.layer.8.intermediate.Linear_dense   [1024, 4096]   
101_bert.encoder.layer.8.output.Linear_dense         [4096, 1024]   
102_bert.encoder.layer.8.output.Dropout_dropout                 -   
103_bert.encoder.layer.8.output.FusedLayerNorm_...         [1024]   
104_bert.encoder.layer.9.attention.self.Linear_...   [1024, 1024]   
105_bert.encoder.layer.9.attention.self.Linear_key   [1024, 1024]   
106_bert.encoder.layer.9.attention.self.Linear_...   [1024, 1024]   
107_bert.encoder.layer.9.attention.self.Dropout...              -   
108_bert.encoder.layer.9.attention.output.Linea...   [1024, 1024]   
109_bert.encoder.layer.9.attention.output.Dropo...              -   
110_bert.encoder.layer.9.attention.output.Fused...         [1024]   
111_bert.encoder.layer.9.intermediate.Linear_dense   [1024, 4096]   
112_bert.encoder.layer.9.output.Linear_dense         [4096, 1024]   
113_bert.encoder.layer.9.output.Dropout_dropout                 -   
114_bert.encoder.layer.9.output.FusedLayerNorm_...         [1024]   
115_bert.encoder.layer.10.attention.self.Linear...   [1024, 1024]   
116_bert.encoder.layer.10.attention.self.Linear...   [1024, 1024]   
117_bert.encoder.layer.10.attention.self.Linear...   [1024, 1024]   
118_bert.encoder.layer.10.attention.self.Dropou...              -   
119_bert.encoder.layer.10.attention.output.Line...   [1024, 1024]   
120_bert.encoder.layer.10.attention.output.Drop...              -   
121_bert.encoder.layer.10.attention.output.Fuse...         [1024]   
122_bert.encoder.layer.10.intermediate.Linear_d...   [1024, 4096]   
123_bert.encoder.layer.10.output.Linear_dense        [4096, 1024]   
124_bert.encoder.layer.10.output.Dropout_dropout                -   
125_bert.encoder.layer.10.output.FusedLayerNorm...         [1024]   
126_bert.encoder.layer.11.attention.self.Linear...   [1024, 1024]   
127_bert.encoder.layer.11.attention.self.Linear...   [1024, 1024]   
128_bert.encoder.layer.11.attention.self.Linear...   [1024, 1024]   
129_bert.encoder.layer.11.attention.self.Dropou...              -   
130_bert.encoder.layer.11.attention.output.Line...   [1024, 1024]   
131_bert.encoder.layer.11.attention.output.Drop...              -   
132_bert.encoder.layer.11.attention.output.Fuse...         [1024]   
133_bert.encoder.layer.11.intermediate.Linear_d...   [1024, 4096]   
134_bert.encoder.layer.11.output.Linear_dense        [4096, 1024]   
135_bert.encoder.layer.11.output.Dropout_dropout                -   
136_bert.encoder.layer.11.output.FusedLayerNorm...         [1024]   
137_bert.encoder.layer.12.attention.self.Linear...   [1024, 1024]   
138_bert.encoder.layer.12.attention.self.Linear...   [1024, 1024]   
139_bert.encoder.layer.12.attention.self.Linear...   [1024, 1024]   
140_bert.encoder.layer.12.attention.self.Dropou...              -   
141_bert.encoder.layer.12.attention.output.Line...   [1024, 1024]   
142_bert.encoder.layer.12.attention.output.Drop...              -   
143_bert.encoder.layer.12.attention.output.Fuse...         [1024]   
144_bert.encoder.layer.12.intermediate.Linear_d...   [1024, 4096]   
145_bert.encoder.layer.12.output.Linear_dense        [4096, 1024]   
146_bert.encoder.layer.12.output.Dropout_dropout                -   
147_bert.encoder.layer.12.output.FusedLayerNorm...         [1024]   
148_bert.encoder.layer.13.attention.self.Linear...   [1024, 1024]   
149_bert.encoder.layer.13.attention.self.Linear...   [1024, 1024]   
150_bert.encoder.layer.13.attention.self.Linear...   [1024, 1024]   
151_bert.encoder.layer.13.attention.self.Dropou...              -   
152_bert.encoder.layer.13.attention.output.Line...   [1024, 1024]   
153_bert.encoder.layer.13.attention.output.Drop...              -   
154_bert.encoder.layer.13.attention.output.Fuse...         [1024]   
155_bert.encoder.layer.13.intermediate.Linear_d...   [1024, 4096]   
156_bert.encoder.layer.13.output.Linear_dense        [4096, 1024]   
157_bert.encoder.layer.13.output.Dropout_dropout                -   
158_bert.encoder.layer.13.output.FusedLayerNorm...         [1024]   
159_bert.encoder.layer.14.attention.self.Linear...   [1024, 1024]   
160_bert.encoder.layer.14.attention.self.Linear...   [1024, 1024]   
161_bert.encoder.layer.14.attention.self.Linear...   [1024, 1024]   
162_bert.encoder.layer.14.attention.self.Dropou...              -   
163_bert.encoder.layer.14.attention.output.Line...   [1024, 1024]   
164_bert.encoder.layer.14.attention.output.Drop...              -   
165_bert.encoder.layer.14.attention.output.Fuse...         [1024]   
166_bert.encoder.layer.14.intermediate.Linear_d...   [1024, 4096]   
167_bert.encoder.layer.14.output.Linear_dense        [4096, 1024]   
168_bert.encoder.layer.14.output.Dropout_dropout                -   
169_bert.encoder.layer.14.output.FusedLayerNorm...         [1024]   
170_bert.encoder.layer.15.attention.self.Linear...   [1024, 1024]   
171_bert.encoder.layer.15.attention.self.Linear...   [1024, 1024]   
172_bert.encoder.layer.15.attention.self.Linear...   [1024, 1024]   
173_bert.encoder.layer.15.attention.self.Dropou...              -   
174_bert.encoder.layer.15.attention.output.Line...   [1024, 1024]   
175_bert.encoder.layer.15.attention.output.Drop...              -   
176_bert.encoder.layer.15.attention.output.Fuse...         [1024]   
177_bert.encoder.layer.15.intermediate.Linear_d...   [1024, 4096]   
178_bert.encoder.layer.15.output.Linear_dense        [4096, 1024]   
179_bert.encoder.layer.15.output.Dropout_dropout                -   
180_bert.encoder.layer.15.output.FusedLayerNorm...         [1024]   
181_bert.encoder.layer.16.attention.self.Linear...   [1024, 1024]   
182_bert.encoder.layer.16.attention.self.Linear...   [1024, 1024]   
183_bert.encoder.layer.16.attention.self.Linear...   [1024, 1024]   
184_bert.encoder.layer.16.attention.self.Dropou...              -   
185_bert.encoder.layer.16.attention.output.Line...   [1024, 1024]   
186_bert.encoder.layer.16.attention.output.Drop...              -   
187_bert.encoder.layer.16.attention.output.Fuse...         [1024]   
188_bert.encoder.layer.16.intermediate.Linear_d...   [1024, 4096]   
189_bert.encoder.layer.16.output.Linear_dense        [4096, 1024]   
190_bert.encoder.layer.16.output.Dropout_dropout                -   
191_bert.encoder.layer.16.output.FusedLayerNorm...         [1024]   
192_bert.encoder.layer.17.attention.self.Linear...   [1024, 1024]   
193_bert.encoder.layer.17.attention.self.Linear...   [1024, 1024]   
194_bert.encoder.layer.17.attention.self.Linear...   [1024, 1024]   
195_bert.encoder.layer.17.attention.self.Dropou...              -   
196_bert.encoder.layer.17.attention.output.Line...   [1024, 1024]   
197_bert.encoder.layer.17.attention.output.Drop...              -   
198_bert.encoder.layer.17.attention.output.Fuse...         [1024]   
199_bert.encoder.layer.17.intermediate.Linear_d...   [1024, 4096]   
200_bert.encoder.layer.17.output.Linear_dense        [4096, 1024]   
201_bert.encoder.layer.17.output.Dropout_dropout                -   
202_bert.encoder.layer.17.output.FusedLayerNorm...         [1024]   
203_bert.encoder.layer.18.attention.self.Linear...   [1024, 1024]   
204_bert.encoder.layer.18.attention.self.Linear...   [1024, 1024]   
205_bert.encoder.layer.18.attention.self.Linear...   [1024, 1024]   
206_bert.encoder.layer.18.attention.self.Dropou...              -   
207_bert.encoder.layer.18.attention.output.Line...   [1024, 1024]   
208_bert.encoder.layer.18.attention.output.Drop...              -   
209_bert.encoder.layer.18.attention.output.Fuse...         [1024]   
210_bert.encoder.layer.18.intermediate.Linear_d...   [1024, 4096]   
211_bert.encoder.layer.18.output.Linear_dense        [4096, 1024]   
212_bert.encoder.layer.18.output.Dropout_dropout                -   
213_bert.encoder.layer.18.output.FusedLayerNorm...         [1024]   
214_bert.encoder.layer.19.attention.self.Linear...   [1024, 1024]   
215_bert.encoder.layer.19.attention.self.Linear...   [1024, 1024]   
216_bert.encoder.layer.19.attention.self.Linear...   [1024, 1024]   
217_bert.encoder.layer.19.attention.self.Dropou...              -   
218_bert.encoder.layer.19.attention.output.Line...   [1024, 1024]   
219_bert.encoder.layer.19.attention.output.Drop...              -   
220_bert.encoder.layer.19.attention.output.Fuse...         [1024]   
221_bert.encoder.layer.19.intermediate.Linear_d...   [1024, 4096]   
222_bert.encoder.layer.19.output.Linear_dense        [4096, 1024]   
223_bert.encoder.layer.19.output.Dropout_dropout                -   
224_bert.encoder.layer.19.output.FusedLayerNorm...         [1024]   
225_bert.encoder.layer.20.attention.self.Linear...   [1024, 1024]   
226_bert.encoder.layer.20.attention.self.Linear...   [1024, 1024]   
227_bert.encoder.layer.20.attention.self.Linear...   [1024, 1024]   
228_bert.encoder.layer.20.attention.self.Dropou...              -   
229_bert.encoder.layer.20.attention.output.Line...   [1024, 1024]   
230_bert.encoder.layer.20.attention.output.Drop...              -   
231_bert.encoder.layer.20.attention.output.Fuse...         [1024]   
232_bert.encoder.layer.20.intermediate.Linear_d...   [1024, 4096]   
233_bert.encoder.layer.20.output.Linear_dense        [4096, 1024]   
234_bert.encoder.layer.20.output.Dropout_dropout                -   
235_bert.encoder.layer.20.output.FusedLayerNorm...         [1024]   
236_bert.encoder.layer.21.attention.self.Linear...   [1024, 1024]   
237_bert.encoder.layer.21.attention.self.Linear...   [1024, 1024]   
238_bert.encoder.layer.21.attention.self.Linear...   [1024, 1024]   
239_bert.encoder.layer.21.attention.self.Dropou...              -   
240_bert.encoder.layer.21.attention.output.Line...   [1024, 1024]   
241_bert.encoder.layer.21.attention.output.Drop...              -   
242_bert.encoder.layer.21.attention.output.Fuse...         [1024]   
243_bert.encoder.layer.21.intermediate.Linear_d...   [1024, 4096]   
244_bert.encoder.layer.21.output.Linear_dense        [4096, 1024]   
245_bert.encoder.layer.21.output.Dropout_dropout                -   
246_bert.encoder.layer.21.output.FusedLayerNorm...         [1024]   
247_bert.encoder.layer.22.attention.self.Linear...   [1024, 1024]   
248_bert.encoder.layer.22.attention.self.Linear...   [1024, 1024]   
249_bert.encoder.layer.22.attention.self.Linear...   [1024, 1024]   
250_bert.encoder.layer.22.attention.self.Dropou...              -   
251_bert.encoder.layer.22.attention.output.Line...   [1024, 1024]   
252_bert.encoder.layer.22.attention.output.Drop...              -   
253_bert.encoder.layer.22.attention.output.Fuse...         [1024]   
254_bert.encoder.layer.22.intermediate.Linear_d...   [1024, 4096]   
255_bert.encoder.layer.22.output.Linear_dense        [4096, 1024]   
256_bert.encoder.layer.22.output.Dropout_dropout                -   
257_bert.encoder.layer.22.output.FusedLayerNorm...         [1024]   
258_bert.encoder.layer.23.attention.self.Linear...   [1024, 1024]   
259_bert.encoder.layer.23.attention.self.Linear...   [1024, 1024]   
260_bert.encoder.layer.23.attention.self.Linear...   [1024, 1024]   
261_bert.encoder.layer.23.attention.self.Dropou...              -   
262_bert.encoder.layer.23.attention.output.Line...   [1024, 1024]   
263_bert.encoder.layer.23.attention.output.Drop...              -   
264_bert.encoder.layer.23.attention.output.Fuse...         [1024]   
265_bert.encoder.layer.23.intermediate.Linear_d...   [1024, 4096]   
266_bert.encoder.layer.23.output.Linear_dense        [4096, 1024]   
267_bert.encoder.layer.23.output.Dropout_dropout                -   
268_bert.encoder.layer.23.output.FusedLayerNorm...         [1024]   
269_bert.pooler.Linear_dense                         [1024, 1024]   
270_bert.pooler.Tanh_activation                                 -   
271_dropout                                                     -   
272_classifier                                          [1024, 2]   

                                                          Output Shape  \
Layer                                                                    
0_bert.embeddings.Embedding_word_embeddings            [16, 256, 1024]   
1_bert.embeddings.Embedding_position_embeddings        [16, 256, 1024]   
2_bert.embeddings.Embedding_token_type_embeddings      [16, 256, 1024]   
3_bert.embeddings.FusedLayerNorm_LayerNorm             [16, 256, 1024]   
4_bert.embeddings.Dropout_dropout                      [16, 256, 1024]   
5_bert.encoder.layer.0.attention.self.Linear_query     [16, 256, 1024]   
6_bert.encoder.layer.0.attention.self.Linear_key       [16, 256, 1024]   
7_bert.encoder.layer.0.attention.self.Linear_value     [16, 256, 1024]   
8_bert.encoder.layer.0.attention.self.Dropout_d...  [16, 16, 256, 256]   
9_bert.encoder.layer.0.attention.output.Linear_...     [16, 256, 1024]   
10_bert.encoder.layer.0.attention.output.Dropou...     [16, 256, 1024]   
11_bert.encoder.layer.0.attention.output.FusedL...     [16, 256, 1024]   
12_bert.encoder.layer.0.intermediate.Linear_dense      [16, 256, 4096]   
13_bert.encoder.layer.0.output.Linear_dense            [16, 256, 1024]   
14_bert.encoder.layer.0.output.Dropout_dropout         [16, 256, 1024]   
15_bert.encoder.layer.0.output.FusedLayerNorm_L...     [16, 256, 1024]   
16_bert.encoder.layer.1.attention.self.Linear_q...     [16, 256, 1024]   
17_bert.encoder.layer.1.attention.self.Linear_key      [16, 256, 1024]   
18_bert.encoder.layer.1.attention.self.Linear_v...     [16, 256, 1024]   
19_bert.encoder.layer.1.attention.self.Dropout_...  [16, 16, 256, 256]   
20_bert.encoder.layer.1.attention.output.Linear...     [16, 256, 1024]   
21_bert.encoder.layer.1.attention.output.Dropou...     [16, 256, 1024]   
22_bert.encoder.layer.1.attention.output.FusedL...     [16, 256, 1024]   
23_bert.encoder.layer.1.intermediate.Linear_dense      [16, 256, 4096]   
24_bert.encoder.layer.1.output.Linear_dense            [16, 256, 1024]   
25_bert.encoder.layer.1.output.Dropout_dropout         [16, 256, 1024]   
26_bert.encoder.layer.1.output.FusedLayerNorm_L...     [16, 256, 1024]   
27_bert.encoder.layer.2.attention.self.Linear_q...     [16, 256, 1024]   
28_bert.encoder.layer.2.attention.self.Linear_key      [16, 256, 1024]   
29_bert.encoder.layer.2.attention.self.Linear_v...     [16, 256, 1024]   
30_bert.encoder.layer.2.attention.self.Dropout_...  [16, 16, 256, 256]   
31_bert.encoder.layer.2.attention.output.Linear...     [16, 256, 1024]   
32_bert.encoder.layer.2.attention.output.Dropou...     [16, 256, 1024]   
33_bert.encoder.layer.2.attention.output.FusedL...     [16, 256, 1024]   
34_bert.encoder.layer.2.intermediate.Linear_dense      [16, 256, 4096]   
35_bert.encoder.layer.2.output.Linear_dense            [16, 256, 1024]   
36_bert.encoder.layer.2.output.Dropout_dropout         [16, 256, 1024]   
37_bert.encoder.layer.2.output.FusedLayerNorm_L...     [16, 256, 1024]   
38_bert.encoder.layer.3.attention.self.Linear_q...     [16, 256, 1024]   
39_bert.encoder.layer.3.attention.self.Linear_key      [16, 256, 1024]   
40_bert.encoder.layer.3.attention.self.Linear_v...     [16, 256, 1024]   
41_bert.encoder.layer.3.attention.self.Dropout_...  [16, 16, 256, 256]   
42_bert.encoder.layer.3.attention.output.Linear...     [16, 256, 1024]   
43_bert.encoder.layer.3.attention.output.Dropou...     [16, 256, 1024]   
44_bert.encoder.layer.3.attention.output.FusedL...     [16, 256, 1024]   
45_bert.encoder.layer.3.intermediate.Linear_dense      [16, 256, 4096]   
46_bert.encoder.layer.3.output.Linear_dense            [16, 256, 1024]   
47_bert.encoder.layer.3.output.Dropout_dropout         [16, 256, 1024]   
48_bert.encoder.layer.3.output.FusedLayerNorm_L...     [16, 256, 1024]   
49_bert.encoder.layer.4.attention.self.Linear_q...     [16, 256, 1024]   
50_bert.encoder.layer.4.attention.self.Linear_key      [16, 256, 1024]   
51_bert.encoder.layer.4.attention.self.Linear_v...     [16, 256, 1024]   
52_bert.encoder.layer.4.attention.self.Dropout_...  [16, 16, 256, 256]   
53_bert.encoder.layer.4.attention.output.Linear...     [16, 256, 1024]   
54_bert.encoder.layer.4.attention.output.Dropou...     [16, 256, 1024]   
55_bert.encoder.layer.4.attention.output.FusedL...     [16, 256, 1024]   
56_bert.encoder.layer.4.intermediate.Linear_dense      [16, 256, 4096]   
57_bert.encoder.layer.4.output.Linear_dense            [16, 256, 1024]   
58_bert.encoder.layer.4.output.Dropout_dropout         [16, 256, 1024]   
59_bert.encoder.layer.4.output.FusedLayerNorm_L...     [16, 256, 1024]   
60_bert.encoder.layer.5.attention.self.Linear_q...     [16, 256, 1024]   
61_bert.encoder.layer.5.attention.self.Linear_key      [16, 256, 1024]   
62_bert.encoder.layer.5.attention.self.Linear_v...     [16, 256, 1024]   
63_bert.encoder.layer.5.attention.self.Dropout_...  [16, 16, 256, 256]   
64_bert.encoder.layer.5.attention.output.Linear...     [16, 256, 1024]   
65_bert.encoder.layer.5.attention.output.Dropou...     [16, 256, 1024]   
66_bert.encoder.layer.5.attention.output.FusedL...     [16, 256, 1024]   
67_bert.encoder.layer.5.intermediate.Linear_dense      [16, 256, 4096]   
68_bert.encoder.layer.5.output.Linear_dense            [16, 256, 1024]   
69_bert.encoder.layer.5.output.Dropout_dropout         [16, 256, 1024]   
70_bert.encoder.layer.5.output.FusedLayerNorm_L...     [16, 256, 1024]   
71_bert.encoder.layer.6.attention.self.Linear_q...     [16, 256, 1024]   
72_bert.encoder.layer.6.attention.self.Linear_key      [16, 256, 1024]   
73_bert.encoder.layer.6.attention.self.Linear_v...     [16, 256, 1024]   
74_bert.encoder.layer.6.attention.self.Dropout_...  [16, 16, 256, 256]   
75_bert.encoder.layer.6.attention.output.Linear...     [16, 256, 1024]   
76_bert.encoder.layer.6.attention.output.Dropou...     [16, 256, 1024]   
77_bert.encoder.layer.6.attention.output.FusedL...     [16, 256, 1024]   
78_bert.encoder.layer.6.intermediate.Linear_dense      [16, 256, 4096]   
79_bert.encoder.layer.6.output.Linear_dense            [16, 256, 1024]   
80_bert.encoder.layer.6.output.Dropout_dropout         [16, 256, 1024]   
81_bert.encoder.layer.6.output.FusedLayerNorm_L...     [16, 256, 1024]   
82_bert.encoder.layer.7.attention.self.Linear_q...     [16, 256, 1024]   
83_bert.encoder.layer.7.attention.self.Linear_key      [16, 256, 1024]   
84_bert.encoder.layer.7.attention.self.Linear_v...     [16, 256, 1024]   
85_bert.encoder.layer.7.attention.self.Dropout_...  [16, 16, 256, 256]   
86_bert.encoder.layer.7.attention.output.Linear...     [16, 256, 1024]   
87_bert.encoder.layer.7.attention.output.Dropou...     [16, 256, 1024]   
88_bert.encoder.layer.7.attention.output.FusedL...     [16, 256, 1024]   
89_bert.encoder.layer.7.intermediate.Linear_dense      [16, 256, 4096]   
90_bert.encoder.layer.7.output.Linear_dense            [16, 256, 1024]   
91_bert.encoder.layer.7.output.Dropout_dropout         [16, 256, 1024]   
92_bert.encoder.layer.7.output.FusedLayerNorm_L...     [16, 256, 1024]   
93_bert.encoder.layer.8.attention.self.Linear_q...     [16, 256, 1024]   
94_bert.encoder.layer.8.attention.self.Linear_key      [16, 256, 1024]   
95_bert.encoder.layer.8.attention.self.Linear_v...     [16, 256, 1024]   
96_bert.encoder.layer.8.attention.self.Dropout_...  [16, 16, 256, 256]   
97_bert.encoder.layer.8.attention.output.Linear...     [16, 256, 1024]   
98_bert.encoder.layer.8.attention.output.Dropou...     [16, 256, 1024]   
99_bert.encoder.layer.8.attention.output.FusedL...     [16, 256, 1024]   
100_bert.encoder.layer.8.intermediate.Linear_dense     [16, 256, 4096]   
101_bert.encoder.layer.8.output.Linear_dense           [16, 256, 1024]   
102_bert.encoder.layer.8.output.Dropout_dropout        [16, 256, 1024]   
103_bert.encoder.layer.8.output.FusedLayerNorm_...     [16, 256, 1024]   
104_bert.encoder.layer.9.attention.self.Linear_...     [16, 256, 1024]   
105_bert.encoder.layer.9.attention.self.Linear_key     [16, 256, 1024]   
106_bert.encoder.layer.9.attention.self.Linear_...     [16, 256, 1024]   
107_bert.encoder.layer.9.attention.self.Dropout...  [16, 16, 256, 256]   
108_bert.encoder.layer.9.attention.output.Linea...     [16, 256, 1024]   
109_bert.encoder.layer.9.attention.output.Dropo...     [16, 256, 1024]   
110_bert.encoder.layer.9.attention.output.Fused...     [16, 256, 1024]   
111_bert.encoder.layer.9.intermediate.Linear_dense     [16, 256, 4096]   
112_bert.encoder.layer.9.output.Linear_dense           [16, 256, 1024]   
113_bert.encoder.layer.9.output.Dropout_dropout        [16, 256, 1024]   
114_bert.encoder.layer.9.output.FusedLayerNorm_...     [16, 256, 1024]   
115_bert.encoder.layer.10.attention.self.Linear...     [16, 256, 1024]   
116_bert.encoder.layer.10.attention.self.Linear...     [16, 256, 1024]   
117_bert.encoder.layer.10.attention.self.Linear...     [16, 256, 1024]   
118_bert.encoder.layer.10.attention.self.Dropou...  [16, 16, 256, 256]   
119_bert.encoder.layer.10.attention.output.Line...     [16, 256, 1024]   
120_bert.encoder.layer.10.attention.output.Drop...     [16, 256, 1024]   
121_bert.encoder.layer.10.attention.output.Fuse...     [16, 256, 1024]   
122_bert.encoder.layer.10.intermediate.Linear_d...     [16, 256, 4096]   
123_bert.encoder.layer.10.output.Linear_dense          [16, 256, 1024]   
124_bert.encoder.layer.10.output.Dropout_dropout       [16, 256, 1024]   
125_bert.encoder.layer.10.output.FusedLayerNorm...     [16, 256, 1024]   
126_bert.encoder.layer.11.attention.self.Linear...     [16, 256, 1024]   
127_bert.encoder.layer.11.attention.self.Linear...     [16, 256, 1024]   
128_bert.encoder.layer.11.attention.self.Linear...     [16, 256, 1024]   
129_bert.encoder.layer.11.attention.self.Dropou...  [16, 16, 256, 256]   
130_bert.encoder.layer.11.attention.output.Line...     [16, 256, 1024]   
131_bert.encoder.layer.11.attention.output.Drop...     [16, 256, 1024]   
132_bert.encoder.layer.11.attention.output.Fuse...     [16, 256, 1024]   
133_bert.encoder.layer.11.intermediate.Linear_d...     [16, 256, 4096]   
134_bert.encoder.layer.11.output.Linear_dense          [16, 256, 1024]   
135_bert.encoder.layer.11.output.Dropout_dropout       [16, 256, 1024]   
136_bert.encoder.layer.11.output.FusedLayerNorm...     [16, 256, 1024]   
137_bert.encoder.layer.12.attention.self.Linear...     [16, 256, 1024]   
138_bert.encoder.layer.12.attention.self.Linear...     [16, 256, 1024]   
139_bert.encoder.layer.12.attention.self.Linear...     [16, 256, 1024]   
140_bert.encoder.layer.12.attention.self.Dropou...  [16, 16, 256, 256]   
141_bert.encoder.layer.12.attention.output.Line...     [16, 256, 1024]   
142_bert.encoder.layer.12.attention.output.Drop...     [16, 256, 1024]   
143_bert.encoder.layer.12.attention.output.Fuse...     [16, 256, 1024]   
144_bert.encoder.layer.12.intermediate.Linear_d...     [16, 256, 4096]   
145_bert.encoder.layer.12.output.Linear_dense          [16, 256, 1024]   
146_bert.encoder.layer.12.output.Dropout_dropout       [16, 256, 1024]   
147_bert.encoder.layer.12.output.FusedLayerNorm...     [16, 256, 1024]   
148_bert.encoder.layer.13.attention.self.Linear...     [16, 256, 1024]   
149_bert.encoder.layer.13.attention.self.Linear...     [16, 256, 1024]   
150_bert.encoder.layer.13.attention.self.Linear...     [16, 256, 1024]   
151_bert.encoder.layer.13.attention.self.Dropou...  [16, 16, 256, 256]   
152_bert.encoder.layer.13.attention.output.Line...     [16, 256, 1024]   
153_bert.encoder.layer.13.attention.output.Drop...     [16, 256, 1024]   
154_bert.encoder.layer.13.attention.output.Fuse...     [16, 256, 1024]   
155_bert.encoder.layer.13.intermediate.Linear_d...     [16, 256, 4096]   
156_bert.encoder.layer.13.output.Linear_dense          [16, 256, 1024]   
157_bert.encoder.layer.13.output.Dropout_dropout       [16, 256, 1024]   
158_bert.encoder.layer.13.output.FusedLayerNorm...     [16, 256, 1024]   
159_bert.encoder.layer.14.attention.self.Linear...     [16, 256, 1024]   
160_bert.encoder.layer.14.attention.self.Linear...     [16, 256, 1024]   
161_bert.encoder.layer.14.attention.self.Linear...     [16, 256, 1024]   
162_bert.encoder.layer.14.attention.self.Dropou...  [16, 16, 256, 256]   
163_bert.encoder.layer.14.attention.output.Line...     [16, 256, 1024]   
164_bert.encoder.layer.14.attention.output.Drop...     [16, 256, 1024]   
165_bert.encoder.layer.14.attention.output.Fuse...     [16, 256, 1024]   
166_bert.encoder.layer.14.intermediate.Linear_d...     [16, 256, 4096]   
167_bert.encoder.layer.14.output.Linear_dense          [16, 256, 1024]   
168_bert.encoder.layer.14.output.Dropout_dropout       [16, 256, 1024]   
169_bert.encoder.layer.14.output.FusedLayerNorm...     [16, 256, 1024]   
170_bert.encoder.layer.15.attention.self.Linear...     [16, 256, 1024]   
171_bert.encoder.layer.15.attention.self.Linear...     [16, 256, 1024]   
172_bert.encoder.layer.15.attention.self.Linear...     [16, 256, 1024]   
173_bert.encoder.layer.15.attention.self.Dropou...  [16, 16, 256, 256]   
174_bert.encoder.layer.15.attention.output.Line...     [16, 256, 1024]   
175_bert.encoder.layer.15.attention.output.Drop...     [16, 256, 1024]   
176_bert.encoder.layer.15.attention.output.Fuse...     [16, 256, 1024]   
177_bert.encoder.layer.15.intermediate.Linear_d...     [16, 256, 4096]   
178_bert.encoder.layer.15.output.Linear_dense          [16, 256, 1024]   
179_bert.encoder.layer.15.output.Dropout_dropout       [16, 256, 1024]   
180_bert.encoder.layer.15.output.FusedLayerNorm...     [16, 256, 1024]   
181_bert.encoder.layer.16.attention.self.Linear...     [16, 256, 1024]   
182_bert.encoder.layer.16.attention.self.Linear...     [16, 256, 1024]   
183_bert.encoder.layer.16.attention.self.Linear...     [16, 256, 1024]   
184_bert.encoder.layer.16.attention.self.Dropou...  [16, 16, 256, 256]   
185_bert.encoder.layer.16.attention.output.Line...     [16, 256, 1024]   
186_bert.encoder.layer.16.attention.output.Drop...     [16, 256, 1024]   
187_bert.encoder.layer.16.attention.output.Fuse...     [16, 256, 1024]   
188_bert.encoder.layer.16.intermediate.Linear_d...     [16, 256, 4096]   
189_bert.encoder.layer.16.output.Linear_dense          [16, 256, 1024]   
190_bert.encoder.layer.16.output.Dropout_dropout       [16, 256, 1024]   
191_bert.encoder.layer.16.output.FusedLayerNorm...     [16, 256, 1024]   
192_bert.encoder.layer.17.attention.self.Linear...     [16, 256, 1024]   
193_bert.encoder.layer.17.attention.self.Linear...     [16, 256, 1024]   
194_bert.encoder.layer.17.attention.self.Linear...     [16, 256, 1024]   
195_bert.encoder.layer.17.attention.self.Dropou...  [16, 16, 256, 256]   
196_bert.encoder.layer.17.attention.output.Line...     [16, 256, 1024]   
197_bert.encoder.layer.17.attention.output.Drop...     [16, 256, 1024]   
198_bert.encoder.layer.17.attention.output.Fuse...     [16, 256, 1024]   
199_bert.encoder.layer.17.intermediate.Linear_d...     [16, 256, 4096]   
200_bert.encoder.layer.17.output.Linear_dense          [16, 256, 1024]   
201_bert.encoder.layer.17.output.Dropout_dropout       [16, 256, 1024]   
202_bert.encoder.layer.17.output.FusedLayerNorm...     [16, 256, 1024]   
203_bert.encoder.layer.18.attention.self.Linear...     [16, 256, 1024]   
204_bert.encoder.layer.18.attention.self.Linear...     [16, 256, 1024]   
205_bert.encoder.layer.18.attention.self.Linear...     [16, 256, 1024]   
206_bert.encoder.layer.18.attention.self.Dropou...  [16, 16, 256, 256]   
207_bert.encoder.layer.18.attention.output.Line...     [16, 256, 1024]   
208_bert.encoder.layer.18.attention.output.Drop...     [16, 256, 1024]   
209_bert.encoder.layer.18.attention.output.Fuse...     [16, 256, 1024]   
210_bert.encoder.layer.18.intermediate.Linear_d...     [16, 256, 4096]   
211_bert.encoder.layer.18.output.Linear_dense          [16, 256, 1024]   
212_bert.encoder.layer.18.output.Dropout_dropout       [16, 256, 1024]   
213_bert.encoder.layer.18.output.FusedLayerNorm...     [16, 256, 1024]   
214_bert.encoder.layer.19.attention.self.Linear...     [16, 256, 1024]   
215_bert.encoder.layer.19.attention.self.Linear...     [16, 256, 1024]   
216_bert.encoder.layer.19.attention.self.Linear...     [16, 256, 1024]   
217_bert.encoder.layer.19.attention.self.Dropou...  [16, 16, 256, 256]   
218_bert.encoder.layer.19.attention.output.Line...     [16, 256, 1024]   
219_bert.encoder.layer.19.attention.output.Drop...     [16, 256, 1024]   
220_bert.encoder.layer.19.attention.output.Fuse...     [16, 256, 1024]   
221_bert.encoder.layer.19.intermediate.Linear_d...     [16, 256, 4096]   
222_bert.encoder.layer.19.output.Linear_dense          [16, 256, 1024]   
223_bert.encoder.layer.19.output.Dropout_dropout       [16, 256, 1024]   
224_bert.encoder.layer.19.output.FusedLayerNorm...     [16, 256, 1024]   
225_bert.encoder.layer.20.attention.self.Linear...     [16, 256, 1024]   
226_bert.encoder.layer.20.attention.self.Linear...     [16, 256, 1024]   
227_bert.encoder.layer.20.attention.self.Linear...     [16, 256, 1024]   
228_bert.encoder.layer.20.attention.self.Dropou...  [16, 16, 256, 256]   
229_bert.encoder.layer.20.attention.output.Line...     [16, 256, 1024]   
230_bert.encoder.layer.20.attention.output.Drop...     [16, 256, 1024]   
231_bert.encoder.layer.20.attention.output.Fuse...     [16, 256, 1024]   
232_bert.encoder.layer.20.intermediate.Linear_d...     [16, 256, 4096]   
233_bert.encoder.layer.20.output.Linear_dense          [16, 256, 1024]   
234_bert.encoder.layer.20.output.Dropout_dropout       [16, 256, 1024]   
235_bert.encoder.layer.20.output.FusedLayerNorm...     [16, 256, 1024]   
236_bert.encoder.layer.21.attention.self.Linear...     [16, 256, 1024]   
237_bert.encoder.layer.21.attention.self.Linear...     [16, 256, 1024]   
238_bert.encoder.layer.21.attention.self.Linear...     [16, 256, 1024]   
239_bert.encoder.layer.21.attention.self.Dropou...  [16, 16, 256, 256]   
240_bert.encoder.layer.21.attention.output.Line...     [16, 256, 1024]   
241_bert.encoder.layer.21.attention.output.Drop...     [16, 256, 1024]   
242_bert.encoder.layer.21.attention.output.Fuse...     [16, 256, 1024]   
243_bert.encoder.layer.21.intermediate.Linear_d...     [16, 256, 4096]   
244_bert.encoder.layer.21.output.Linear_dense          [16, 256, 1024]   
245_bert.encoder.layer.21.output.Dropout_dropout       [16, 256, 1024]   
246_bert.encoder.layer.21.output.FusedLayerNorm...     [16, 256, 1024]   
247_bert.encoder.layer.22.attention.self.Linear...     [16, 256, 1024]   
248_bert.encoder.layer.22.attention.self.Linear...     [16, 256, 1024]   
249_bert.encoder.layer.22.attention.self.Linear...     [16, 256, 1024]   
250_bert.encoder.layer.22.attention.self.Dropou...  [16, 16, 256, 256]   
251_bert.encoder.layer.22.attention.output.Line...     [16, 256, 1024]   
252_bert.encoder.layer.22.attention.output.Drop...     [16, 256, 1024]   
253_bert.encoder.layer.22.attention.output.Fuse...     [16, 256, 1024]   
254_bert.encoder.layer.22.intermediate.Linear_d...     [16, 256, 4096]   
255_bert.encoder.layer.22.output.Linear_dense          [16, 256, 1024]   
256_bert.encoder.layer.22.output.Dropout_dropout       [16, 256, 1024]   
257_bert.encoder.layer.22.output.FusedLayerNorm...     [16, 256, 1024]   
258_bert.encoder.layer.23.attention.self.Linear...     [16, 256, 1024]   
259_bert.encoder.layer.23.attention.self.Linear...     [16, 256, 1024]   
260_bert.encoder.layer.23.attention.self.Linear...     [16, 256, 1024]   
261_bert.encoder.layer.23.attention.self.Dropou...  [16, 16, 256, 256]   
262_bert.encoder.layer.23.attention.output.Line...     [16, 256, 1024]   
263_bert.encoder.layer.23.attention.output.Drop...     [16, 256, 1024]   
264_bert.encoder.layer.23.attention.output.Fuse...     [16, 256, 1024]   
265_bert.encoder.layer.23.intermediate.Linear_d...     [16, 256, 4096]   
266_bert.encoder.layer.23.output.Linear_dense          [16, 256, 1024]   
267_bert.encoder.layer.23.output.Dropout_dropout       [16, 256, 1024]   
268_bert.encoder.layer.23.output.FusedLayerNorm...     [16, 256, 1024]   
269_bert.pooler.Linear_dense                                [16, 1024]   
270_bert.pooler.Tanh_activation                             [16, 1024]   
271_dropout                                                 [16, 1024]   
272_classifier                                                 [16, 2]   

                                                       Params  Mult-Adds  
Layer                                                                     
0_bert.embeddings.Embedding_word_embeddings                 -          -  
1_bert.embeddings.Embedding_position_embeddings             -          -  
2_bert.embeddings.Embedding_token_type_embeddings           -          -  
3_bert.embeddings.FusedLayerNorm_LayerNorm                  -          -  
4_bert.embeddings.Dropout_dropout                           -          -  
5_bert.encoder.layer.0.attention.self.Linear_query          -          -  
6_bert.encoder.layer.0.attention.self.Linear_key            -          -  
7_bert.encoder.layer.0.attention.self.Linear_value          -          -  
8_bert.encoder.layer.0.attention.self.Dropout_d...          -          -  
9_bert.encoder.layer.0.attention.output.Linear_...          -          -  
10_bert.encoder.layer.0.attention.output.Dropou...          -          -  
11_bert.encoder.layer.0.attention.output.FusedL...          -          -  
12_bert.encoder.layer.0.intermediate.Linear_dense           -          -  
13_bert.encoder.layer.0.output.Linear_dense                 -          -  
14_bert.encoder.layer.0.output.Dropout_dropout              -          -  
15_bert.encoder.layer.0.output.FusedLayerNorm_L...          -          -  
16_bert.encoder.layer.1.attention.self.Linear_q...          -          -  
17_bert.encoder.layer.1.attention.self.Linear_key           -          -  
18_bert.encoder.layer.1.attention.self.Linear_v...          -          -  
19_bert.encoder.layer.1.attention.self.Dropout_...          -          -  
20_bert.encoder.layer.1.attention.output.Linear...          -          -  
21_bert.encoder.layer.1.attention.output.Dropou...          -          -  
22_bert.encoder.layer.1.attention.output.FusedL...          -          -  
23_bert.encoder.layer.1.intermediate.Linear_dense           -          -  
24_bert.encoder.layer.1.output.Linear_dense                 -          -  
25_bert.encoder.layer.1.output.Dropout_dropout              -          -  
26_bert.encoder.layer.1.output.FusedLayerNorm_L...          -          -  
27_bert.encoder.layer.2.attention.self.Linear_q...          -          -  
28_bert.encoder.layer.2.attention.self.Linear_key           -          -  
29_bert.encoder.layer.2.attention.self.Linear_v...          -          -  
30_bert.encoder.layer.2.attention.self.Dropout_...          -          -  
31_bert.encoder.layer.2.attention.output.Linear...          -          -  
32_bert.encoder.layer.2.attention.output.Dropou...          -          -  
33_bert.encoder.layer.2.attention.output.FusedL...          -          -  
34_bert.encoder.layer.2.intermediate.Linear_dense           -          -  
35_bert.encoder.layer.2.output.Linear_dense                 -          -  
36_bert.encoder.layer.2.output.Dropout_dropout              -          -  
37_bert.encoder.layer.2.output.FusedLayerNorm_L...          -          -  
38_bert.encoder.layer.3.attention.self.Linear_q...          -          -  
39_bert.encoder.layer.3.attention.self.Linear_key           -          -  
40_bert.encoder.layer.3.attention.self.Linear_v...          -          -  
41_bert.encoder.layer.3.attention.self.Dropout_...          -          -  
42_bert.encoder.layer.3.attention.output.Linear...          -          -  
43_bert.encoder.layer.3.attention.output.Dropou...          -          -  
44_bert.encoder.layer.3.attention.output.FusedL...          -          -  
45_bert.encoder.layer.3.intermediate.Linear_dense           -          -  
46_bert.encoder.layer.3.output.Linear_dense                 -          -  
47_bert.encoder.layer.3.output.Dropout_dropout              -          -  
48_bert.encoder.layer.3.output.FusedLayerNorm_L...          -          -  
49_bert.encoder.layer.4.attention.self.Linear_q...          -          -  
50_bert.encoder.layer.4.attention.self.Linear_key           -          -  
51_bert.encoder.layer.4.attention.self.Linear_v...          -          -  
52_bert.encoder.layer.4.attention.self.Dropout_...          -          -  
53_bert.encoder.layer.4.attention.output.Linear...          -          -  
54_bert.encoder.layer.4.attention.output.Dropou...          -          -  
55_bert.encoder.layer.4.attention.output.FusedL...          -          -  
56_bert.encoder.layer.4.intermediate.Linear_dense           -          -  
57_bert.encoder.layer.4.output.Linear_dense                 -          -  
58_bert.encoder.layer.4.output.Dropout_dropout              -          -  
59_bert.encoder.layer.4.output.FusedLayerNorm_L...          -          -  
60_bert.encoder.layer.5.attention.self.Linear_q...          -          -  
61_bert.encoder.layer.5.attention.self.Linear_key           -          -  
62_bert.encoder.layer.5.attention.self.Linear_v...          -          -  
63_bert.encoder.layer.5.attention.self.Dropout_...          -          -  
64_bert.encoder.layer.5.attention.output.Linear...          -          -  
65_bert.encoder.layer.5.attention.output.Dropou...          -          -  
66_bert.encoder.layer.5.attention.output.FusedL...          -          -  
67_bert.encoder.layer.5.intermediate.Linear_dense           -          -  
68_bert.encoder.layer.5.output.Linear_dense                 -          -  
69_bert.encoder.layer.5.output.Dropout_dropout              -          -  
70_bert.encoder.layer.5.output.FusedLayerNorm_L...          -          -  
71_bert.encoder.layer.6.attention.self.Linear_q...          -          -  
72_bert.encoder.layer.6.attention.self.Linear_key           -          -  
73_bert.encoder.layer.6.attention.self.Linear_v...          -          -  
74_bert.encoder.layer.6.attention.self.Dropout_...          -          -  
75_bert.encoder.layer.6.attention.output.Linear...          -          -  
76_bert.encoder.layer.6.attention.output.Dropou...          -          -  
77_bert.encoder.layer.6.attention.output.FusedL...          -          -  
78_bert.encoder.layer.6.intermediate.Linear_dense           -          -  
79_bert.encoder.layer.6.output.Linear_dense                 -          -  
80_bert.encoder.layer.6.output.Dropout_dropout              -          -  
81_bert.encoder.layer.6.output.FusedLayerNorm_L...          -          -  
82_bert.encoder.layer.7.attention.self.Linear_q...          -          -  
83_bert.encoder.layer.7.attention.self.Linear_key           -          -  
84_bert.encoder.layer.7.attention.self.Linear_v...          -          -  
85_bert.encoder.layer.7.attention.self.Dropout_...          -          -  
86_bert.encoder.layer.7.attention.output.Linear...          -          -  
87_bert.encoder.layer.7.attention.output.Dropou...          -          -  
88_bert.encoder.layer.7.attention.output.FusedL...          -          -  
89_bert.encoder.layer.7.intermediate.Linear_dense           -          -  
90_bert.encoder.layer.7.output.Linear_dense                 -          -  
91_bert.encoder.layer.7.output.Dropout_dropout              -          -  
92_bert.encoder.layer.7.output.FusedLayerNorm_L...          -          -  
93_bert.encoder.layer.8.attention.self.Linear_q...          -          -  
94_bert.encoder.layer.8.attention.self.Linear_key           -          -  
95_bert.encoder.layer.8.attention.self.Linear_v...          -          -  
96_bert.encoder.layer.8.attention.self.Dropout_...          -          -  
97_bert.encoder.layer.8.attention.output.Linear...          -          -  
98_bert.encoder.layer.8.attention.output.Dropou...          -          -  
99_bert.encoder.layer.8.attention.output.FusedL...          -          -  
100_bert.encoder.layer.8.intermediate.Linear_dense          -          -  
101_bert.encoder.layer.8.output.Linear_dense                -          -  
102_bert.encoder.layer.8.output.Dropout_dropout             -          -  
103_bert.encoder.layer.8.output.FusedLayerNorm_...          -          -  
104_bert.encoder.layer.9.attention.self.Linear_...          -          -  
105_bert.encoder.layer.9.attention.self.Linear_key          -          -  
106_bert.encoder.layer.9.attention.self.Linear_...          -          -  
107_bert.encoder.layer.9.attention.self.Dropout...          -          -  
108_bert.encoder.layer.9.attention.output.Linea...          -          -  
109_bert.encoder.layer.9.attention.output.Dropo...          -          -  
110_bert.encoder.layer.9.attention.output.Fused...          -          -  
111_bert.encoder.layer.9.intermediate.Linear_dense          -          -  
112_bert.encoder.layer.9.output.Linear_dense                -          -  
113_bert.encoder.layer.9.output.Dropout_dropout             -          -  
114_bert.encoder.layer.9.output.FusedLayerNorm_...          -          -  
115_bert.encoder.layer.10.attention.self.Linear...          -          -  
116_bert.encoder.layer.10.attention.self.Linear...          -          -  
117_bert.encoder.layer.10.attention.self.Linear...          -          -  
118_bert.encoder.layer.10.attention.self.Dropou...          -          -  
119_bert.encoder.layer.10.attention.output.Line...          -          -  
120_bert.encoder.layer.10.attention.output.Drop...          -          -  
121_bert.encoder.layer.10.attention.output.Fuse...          -          -  
122_bert.encoder.layer.10.intermediate.Linear_d...          -          -  
123_bert.encoder.layer.10.output.Linear_dense               -          -  
124_bert.encoder.layer.10.output.Dropout_dropout            -          -  
125_bert.encoder.layer.10.output.FusedLayerNorm...          -          -  
126_bert.encoder.layer.11.attention.self.Linear...          -          -  
127_bert.encoder.layer.11.attention.self.Linear...          -          -  
128_bert.encoder.layer.11.attention.self.Linear...          -          -  
129_bert.encoder.layer.11.attention.self.Dropou...          -          -  
130_bert.encoder.layer.11.attention.output.Line...          -          -  
131_bert.encoder.layer.11.attention.output.Drop...          -          -  
132_bert.encoder.layer.11.attention.output.Fuse...          -          -  
133_bert.encoder.layer.11.intermediate.Linear_d...          -          -  
134_bert.encoder.layer.11.output.Linear_dense               -          -  
135_bert.encoder.layer.11.output.Dropout_dropout            -          -  
136_bert.encoder.layer.11.output.FusedLayerNorm...          -          -  
137_bert.encoder.layer.12.attention.self.Linear...          -          -  
138_bert.encoder.layer.12.attention.self.Linear...          -          -  
139_bert.encoder.layer.12.attention.self.Linear...          -          -  
140_bert.encoder.layer.12.attention.self.Dropou...          -          -  
141_bert.encoder.layer.12.attention.output.Line...          -          -  
142_bert.encoder.layer.12.attention.output.Drop...          -          -  
143_bert.encoder.layer.12.attention.output.Fuse...          -          -  
144_bert.encoder.layer.12.intermediate.Linear_d...          -          -  
145_bert.encoder.layer.12.output.Linear_dense               -          -  
146_bert.encoder.layer.12.output.Dropout_dropout            -          -  
147_bert.encoder.layer.12.output.FusedLayerNorm...          -          -  
148_bert.encoder.layer.13.attention.self.Linear...          -          -  
149_bert.encoder.layer.13.attention.self.Linear...          -          -  
150_bert.encoder.layer.13.attention.self.Linear...          -          -  
151_bert.encoder.layer.13.attention.self.Dropou...          -          -  
152_bert.encoder.layer.13.attention.output.Line...          -          -  
153_bert.encoder.layer.13.attention.output.Drop...          -          -  
154_bert.encoder.layer.13.attention.output.Fuse...          -          -  
155_bert.encoder.layer.13.intermediate.Linear_d...          -          -  
156_bert.encoder.layer.13.output.Linear_dense               -          -  
157_bert.encoder.layer.13.output.Dropout_dropout            -          -  
158_bert.encoder.layer.13.output.FusedLayerNorm...          -          -  
159_bert.encoder.layer.14.attention.self.Linear...          -          -  
160_bert.encoder.layer.14.attention.self.Linear...          -          -  
161_bert.encoder.layer.14.attention.self.Linear...          -          -  
162_bert.encoder.layer.14.attention.self.Dropou...          -          -  
163_bert.encoder.layer.14.attention.output.Line...          -          -  
164_bert.encoder.layer.14.attention.output.Drop...          -          -  
165_bert.encoder.layer.14.attention.output.Fuse...          -          -  
166_bert.encoder.layer.14.intermediate.Linear_d...          -          -  
167_bert.encoder.layer.14.output.Linear_dense               -          -  
168_bert.encoder.layer.14.output.Dropout_dropout            -          -  
169_bert.encoder.layer.14.output.FusedLayerNorm...          -          -  
170_bert.encoder.layer.15.attention.self.Linear...          -          -  
171_bert.encoder.layer.15.attention.self.Linear...          -          -  
172_bert.encoder.layer.15.attention.self.Linear...          -          -  
173_bert.encoder.layer.15.attention.self.Dropou...          -          -  
174_bert.encoder.layer.15.attention.output.Line...          -          -  
175_bert.encoder.layer.15.attention.output.Drop...          -          -  
176_bert.encoder.layer.15.attention.output.Fuse...          -          -  
177_bert.encoder.layer.15.intermediate.Linear_d...          -          -  
178_bert.encoder.layer.15.output.Linear_dense               -          -  
179_bert.encoder.layer.15.output.Dropout_dropout            -          -  
180_bert.encoder.layer.15.output.FusedLayerNorm...          -          -  
181_bert.encoder.layer.16.attention.self.Linear...          -          -  
182_bert.encoder.layer.16.attention.self.Linear...          -          -  
183_bert.encoder.layer.16.attention.self.Linear...          -          -  
184_bert.encoder.layer.16.attention.self.Dropou...          -          -  
185_bert.encoder.layer.16.attention.output.Line...          -          -  
186_bert.encoder.layer.16.attention.output.Drop...          -          -  
187_bert.encoder.layer.16.attention.output.Fuse...          -          -  
188_bert.encoder.layer.16.intermediate.Linear_d...          -          -  
189_bert.encoder.layer.16.output.Linear_dense               -          -  
190_bert.encoder.layer.16.output.Dropout_dropout            -          -  
191_bert.encoder.layer.16.output.FusedLayerNorm...          -          -  
192_bert.encoder.layer.17.attention.self.Linear...          -          -  
193_bert.encoder.layer.17.attention.self.Linear...          -          -  
194_bert.encoder.layer.17.attention.self.Linear...          -          -  
195_bert.encoder.layer.17.attention.self.Dropou...          -          -  
196_bert.encoder.layer.17.attention.output.Line...          -          -  
197_bert.encoder.layer.17.attention.output.Drop...          -          -  
198_bert.encoder.layer.17.attention.output.Fuse...          -          -  
199_bert.encoder.layer.17.intermediate.Linear_d...          -          -  
200_bert.encoder.layer.17.output.Linear_dense               -          -  
201_bert.encoder.layer.17.output.Dropout_dropout            -          -  
202_bert.encoder.layer.17.output.FusedLayerNorm...          -          -  
203_bert.encoder.layer.18.attention.self.Linear...          -          -  
204_bert.encoder.layer.18.attention.self.Linear...          -          -  
205_bert.encoder.layer.18.attention.self.Linear...          -          -  
206_bert.encoder.layer.18.attention.self.Dropou...          -          -  
207_bert.encoder.layer.18.attention.output.Line...          -          -  
208_bert.encoder.layer.18.attention.output.Drop...          -          -  
209_bert.encoder.layer.18.attention.output.Fuse...          -          -  
210_bert.encoder.layer.18.intermediate.Linear_d...          -          -  
211_bert.encoder.layer.18.output.Linear_dense               -          -  
212_bert.encoder.layer.18.output.Dropout_dropout            -          -  
213_bert.encoder.layer.18.output.FusedLayerNorm...          -          -  
214_bert.encoder.layer.19.attention.self.Linear...          -          -  
215_bert.encoder.layer.19.attention.self.Linear...          -          -  
216_bert.encoder.layer.19.attention.self.Linear...          -          -  
217_bert.encoder.layer.19.attention.self.Dropou...          -          -  
218_bert.encoder.layer.19.attention.output.Line...          -          -  
219_bert.encoder.layer.19.attention.output.Drop...          -          -  
220_bert.encoder.layer.19.attention.output.Fuse...          -          -  
221_bert.encoder.layer.19.intermediate.Linear_d...          -          -  
222_bert.encoder.layer.19.output.Linear_dense               -          -  
223_bert.encoder.layer.19.output.Dropout_dropout            -          -  
224_bert.encoder.layer.19.output.FusedLayerNorm...          -          -  
225_bert.encoder.layer.20.attention.self.Linear...    1.0496M  1.048576M  
226_bert.encoder.layer.20.attention.self.Linear...    1.0496M  1.048576M  
227_bert.encoder.layer.20.attention.self.Linear...    1.0496M  1.048576M  
228_bert.encoder.layer.20.attention.self.Dropou...          -          -  
229_bert.encoder.layer.20.attention.output.Line...    1.0496M  1.048576M  
230_bert.encoder.layer.20.attention.output.Drop...          -          -  
231_bert.encoder.layer.20.attention.output.Fuse...     2.048k     1.024k  
232_bert.encoder.layer.20.intermediate.Linear_d...    4.1984M  4.194304M  
233_bert.encoder.layer.20.output.Linear_dense       4.195328M  4.194304M  
234_bert.encoder.layer.20.output.Dropout_dropout            -          -  
235_bert.encoder.layer.20.output.FusedLayerNorm...     2.048k     1.024k  
236_bert.encoder.layer.21.attention.self.Linear...    1.0496M  1.048576M  
237_bert.encoder.layer.21.attention.self.Linear...    1.0496M  1.048576M  
238_bert.encoder.layer.21.attention.self.Linear...    1.0496M  1.048576M  
239_bert.encoder.layer.21.attention.self.Dropou...          -          -  
240_bert.encoder.layer.21.attention.output.Line...    1.0496M  1.048576M  
241_bert.encoder.layer.21.attention.output.Drop...          -          -  
242_bert.encoder.layer.21.attention.output.Fuse...     2.048k     1.024k  
243_bert.encoder.layer.21.intermediate.Linear_d...    4.1984M  4.194304M  
244_bert.encoder.layer.21.output.Linear_dense       4.195328M  4.194304M  
245_bert.encoder.layer.21.output.Dropout_dropout            -          -  
246_bert.encoder.layer.21.output.FusedLayerNorm...     2.048k     1.024k  
247_bert.encoder.layer.22.attention.self.Linear...    1.0496M  1.048576M  
248_bert.encoder.layer.22.attention.self.Linear...    1.0496M  1.048576M  
249_bert.encoder.layer.22.attention.self.Linear...    1.0496M  1.048576M  
250_bert.encoder.layer.22.attention.self.Dropou...          -          -  
251_bert.encoder.layer.22.attention.output.Line...    1.0496M  1.048576M  
252_bert.encoder.layer.22.attention.output.Drop...          -          -  
253_bert.encoder.layer.22.attention.output.Fuse...     2.048k     1.024k  
254_bert.encoder.layer.22.intermediate.Linear_d...    4.1984M  4.194304M  
255_bert.encoder.layer.22.output.Linear_dense       4.195328M  4.194304M  
256_bert.encoder.layer.22.output.Dropout_dropout            -          -  
257_bert.encoder.layer.22.output.FusedLayerNorm...     2.048k     1.024k  
258_bert.encoder.layer.23.attention.self.Linear...    1.0496M  1.048576M  
259_bert.encoder.layer.23.attention.self.Linear...    1.0496M  1.048576M  
260_bert.encoder.layer.23.attention.self.Linear...    1.0496M  1.048576M  
261_bert.encoder.layer.23.attention.self.Dropou...          -          -  
262_bert.encoder.layer.23.attention.output.Line...    1.0496M  1.048576M  
263_bert.encoder.layer.23.attention.output.Drop...          -          -  
264_bert.encoder.layer.23.attention.output.Fuse...     2.048k     1.024k  
265_bert.encoder.layer.23.intermediate.Linear_d...    4.1984M  4.194304M  
266_bert.encoder.layer.23.output.Linear_dense       4.195328M  4.194304M  
267_bert.encoder.layer.23.output.Dropout_dropout            -          -  
268_bert.encoder.layer.23.output.FusedLayerNorm...     2.048k     1.024k  
269_bert.pooler.Linear_dense                          1.0496M  1.048576M  
270_bert.pooler.Tanh_activation                             -          -  
271_dropout                                                 -          -  
272_classifier                                          2.05k     2.048k  
-----------------------------------------------------------------------------------------------------------
                           Totals
Total params          333.581314M
Trainable params       51.436546M
Non-trainable params  282.144768M
Mult-Adds              51.390464M
===========================================================================================================
nmhkahn commented 5 years ago

Yes, I was planning to modify to show both total and non-trainable params.