NBN denotes no batch normalization.
In the original paper. In table 1. The author use conv2d 1x1, NBN in the 1x1x960 to 1x1x1280, which means they don't use batch norm in fc layers, but your code use bn. This is not the same as the original version of the paper. Although this little change may not influence the final results.
out = self.hs3(self.bn3(self.linear3(out)))out = self.linear4(out)
NBN denotes no batch normalization. In the original paper. In table 1. The author use conv2d 1x1, NBN in the 1x1x960 to 1x1x1280, which means they don't use batch norm in fc layers, but your code use bn. This is not the same as the original version of the paper. Although this little change may not influence the final results.
out = self.hs3(self.bn3(self.linear3(out)))
out = self.linear4(out)