Closed moonsh closed 1 year ago
In FastMETRO, they use a CNN backbone to extract features and then apply their transformer architecture.
In our POTTER, we use the POTTER_cls as our backbone (12M params), then add the HR stream part and the HMR head part. The total Params is reported in Table 2, which is 16M (12M from the backbone and 4M from the rest).
I see. What about the detector model? Faster RCNN?
Yes, right now we use the Faster RCNN in our inference demo code.
I got it. Thank you!
Just curious. I found that the parameter numbers from the table in your paper for FastMetro included CNN backbone parameters. However, your network excludes detector parameters.