Hey,
Thank you for the lovely work and provoking more research in the direction of Anchor Free Detectors. Having read the paper and gone through the code I had a specific question concerning the normalization of regression outputs in view of the recent ameliorations realized to FCOS.
From your code, it's clear that:
During training, the regression targets are normalized by dividing them by the network strides
During testing, the regression predictions are normalized by multiplying them by the network strides.
However, in the paper/thesis, you have clearly mentioned that during training the predictions are normalized by a scalar 's' using the equation exp(s,x), which equates to s^x, where s is the base and x the exponent. Additionally, it's mentioned that this scalar ought to be learnable for tailored regression towards targets at different pyramid levels (P3 to P7).
My question is, is dividing the regression targets (by the strides) during training and then multiplying regression predictions (by the strides) during testing tantamount to the training methodology described in the paper/thesis - rendering 's' trainable?
Hey, Thank you for the lovely work and provoking more research in the direction of Anchor Free Detectors. Having read the paper and gone through the code I had a specific question concerning the normalization of regression outputs in view of the recent ameliorations realized to FCOS. From your code, it's clear that:
However, in the paper/thesis, you have clearly mentioned that during training the predictions are normalized by a scalar 's' using the equation exp(s,x), which equates to s^x, where s is the base and x the exponent. Additionally, it's mentioned that this scalar ought to be learnable for tailored regression towards targets at different pyramid levels (P3 to P7).
My question is, is dividing the regression targets (by the strides) during training and then multiplying regression predictions (by the strides) during testing tantamount to the training methodology described in the paper/thesis - rendering 's' trainable?