wudashuo / yolov5

YOLOv5 汉化版,保持官方同步更新
GNU General Public License v3.0
445 stars 100 forks source link

val.py 跟 train.py 的 P R map 等结果不一样 #9

Closed G-Apple1 closed 3 years ago

G-Apple1 commented 3 years ago

❔Question

我修改了一下网络,train.py训练出来的P R map这些结果挺正常的,但是我用训练出来的权重跑 val.py 的时候P R map就全部都是0,这是怎么回事?

Additional context

wudashuo commented 3 years ago

首先确定你的指令是否没问题,训练时调用的就是val.py的方法,按理说应该差不多的。 另外就是,train.py,detect.py,val.py这三个用的iou阈值和conf阈值都不太一样,最后结果会有略微差别,但不会是你这种情况。

G-Apple1 commented 3 years ago

❔Question

I made some changes to your network model (add SE module to the Conv ), and the trained model achieved a little improvement on my test set ( I saw the final val result at the end of training ). But when i run val.py with the --weights run/train/exp/weights/last.pt, the result P, R, map@0.5 and so on all is 0 .

The results of train

Epoch gpu_mem box obj cls labels img_size 199/199 3.47G 0.01118 0.03534 0.001137 23 960: 100%|██████████| 492/492 [05:17<00:00, 1.55it/s] Class Images Labels P R mAP@.5 mAP@.5:.95: 100%|██████████| 115/115 [00:24<00:00, 4.73it/s] all 460 5189 0.827 0.827 0.875 0.691 0 460 1872 0.792 0.759 0.825 0.641 1 460 1611 0.878 0.757 0.851 0.626 2 460 1706 0.81 0.966 0.948 0.806 200 epochs completed in 19.047 hours. Optimizer stripped from runs/train/exp6/weights/last.pt, 93.8MB Optimizer stripped from runs/train/exp6/weights/best.pt, 93.8MB Results saved to runs/train/exp6

The results of val (yolov5) scau2@scau2-desktop:/media/scau2/1T2/xcg/YOLOV5/yolov5$ python val.py --data p_datasets.yaml --img 960 --weights runs/train/exp6/weights/best.pt --device 0 --task val --half val: data=./data/p_datasets.yaml, weights=['runs/train/yolov5s-se300/weights/best.pt'], batch_size=8, imgsz=960, conf_thres=0.001, iou_thres=0.6, task=val, device=0, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=--exist-ok, half=True YOLOv5 🚀 v5.0-339-g53bfcbe torch 1.7.0 CUDA:0 (GeForce GTX 980 Ti, 6080.8125MB)

Fusing layers... Model Summary: 637 layers, 7364072 parameters, 0 gradients, 16.3 GFLOPs Class Images Labels P R mAP@.5 mAP@.5:.95: 100%|█| 58/58 all 460 0 0 0 0 0: 97%|▉| 56/58 Speed: 0.4ms pre-process, 7.3ms inference, 0.2ms NMS per image at shape (8, 3, 960, 960) val: Scanning '../p_datasets/labels/test.cache' images and labels... 460 found, 0 missing, 0 empty, 0 Results saved to runs/val/exp

Additional context

i run detect.py with the --weights run/train/exp/weights/last.pt, the predictions is poor.

Add SE module to the Conv as follow

class SELayer(nn.Module): def init(self, channel, reduction=16): super(SELayer, self).init() self.avg_pool = nn.AdaptiveAvgPool2d(1) self.fc = nn.Sequential( nn.Linear(channel, channel // reduction, bias=False), nn.ReLU(inplace=True), nn.Linear(channel // reduction, channel, bias=False), nn.Sigmoid() )

def forward(self, x):
    b, c, _, _ = x.size()
    y = self.avg_pool(x).view(b, c)
    y = self.fc(y).view(b, c, 1, 1)
    return x * y.expand_as(x)

class Conv(nn.Module):

Standard convolution

def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups
    super().__init__()
    self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
    self.bn = nn.BatchNorm2d(c2)
    self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
    self.se_conv = SE_Conv(c2)

def forward(self, x):
    return self.act(self.bn(self.conv(x)))

def forward_fuse(self, x):
    return self.se_conv(self.act(self.conv(x)))
G-Apple1 commented 3 years ago

首先确定你的指令是否没问题,训练时调用的就是val.py的方法,按理说应该差不多的。 另外就是,train.py,detect.py,val.py这三个用的iou阈值和conf阈值都不太一样,最后结果会有略微差别,但不会是你这种情况。

我在common.py的conv模块中添加了SE模块,参数量增加了,我不清楚这样添加正不正确,但是训练的时候没有报错,评价指标也都正常,但是我把训练完保存的runs/train/exp/weights/best.pt拿来跑val.py的时候指标就全部是0,

我发现我只要把 def forward_fuse(self, x): return self.se_conv(self.act(self.conv(x)))这里改成return self.act(self.conv(x))就行了(跑val.py能复现训练时的指标),我怀疑我添加的SE 模块没有训练到,或者是train.py加载的模型和val.py加载的模型不一样

wudashuo commented 3 years ago

train.pyval.py加载模型是不一样的,比如像identity这样的结构等,你可以去common.pyyolo.py看一下就明白了

G-Apple1 commented 3 years ago

train.pyval.py加载模型是不一样的,比如像identity这样的结构等,你可以去common.pyyolo.py看一下就明白了

谢谢您的提醒,这个问题我已经解决了,出现这个问题的原因是: train.py构建的模型的Conv模块调用的是forward函数,而val.py构建的模型的Conv模块调用的是fuseforward函数,两个函数的差别是nn.batchnorm函数