Hi, first of all thank you so much, that's a really helpful repo!
I'm now working with some pretrained BERT models, so I'm wondering if this flop_counter works for all types of BERT models? I noticed here's an example using BertForSequenceClassification, with input_res=(2, 128). But when I try some other BERT models, the input_res could be the big problem. If there any baseline for BERT model like you mentioned (similar to image networks) in README? Or is there any lookup table? Thank you very much!
What do you mean by saying input_res could be the big problem?
If the input shape varies then amount of flops also varies and only average value across a dataset will make sense.
Hi, first of all thank you so much, that's a really helpful repo!
I'm now working with some pretrained BERT models, so I'm wondering if this flop_counter works for all types of BERT models? I noticed here's an example using BertForSequenceClassification, with input_res=(2, 128). But when I try some other BERT models, the input_res could be the big problem. If there any baseline for BERT model like you mentioned (similar to image networks) in README? Or is there any lookup table? Thank you very much!