yatengLG / Changeable

基于pytorch的目标检测数据增强工具包。
12 stars 2 forks source link

博主,数据集能告诉我下吗! #1

Closed hhamm closed 2 years ago

yatengLG commented 3 years ago

数据增强部分对数据集格式没有要求,只要输出是 image, boxes, labels, image_name 就可以。 你可以按照这里写一个。https://github.com/yatengLG/Changeable/blob/a2db3be3df144dd950278d3d00622cf8ac45b407/changeable/dataset.py#L44

如果自己用labelimg标注了数据,可以通过 这个函数生成voc格式的数据集。(也提供了yolo格式数据集转换voc格式的函数)https://github.com/yatengLG/Changeable/blob/a2db3be3df144dd950278d3d00622cf8ac45b407/changeable/utils/dataset.py#L85

你也可以直接下载voc数据集,具体链接: http://host.robots.ox.ac.uk/pascal/VOC/

liu-ai-z commented 2 years ago

博主你好,我在使用的过程中发现,当num_works = 4时会报这样的错误 File "E:/yolox-trans/train.py", line 234, in fit_one_epoch(model_train, model, yolo_loss, loss_history, optimizer, epoch, File "E:\yolox-trans\utils\utils_fit.py", line 14, in fit_one_epoch for iteration, batch in enumerate(gen): File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data\dataloader.py", line 435, in next data = self._next_data() File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data\dataloader.py", line 1085, in _next_data return self._process_data(data) File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data\dataloader.py", line 1111, in _process_data data.reraise() File "D:\miniconda\envs\py38\lib\site-packages\torch_utils.py", line 428, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in DataLoader worker process 3. Original Traceback (most recent call last): File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data_utils\worker.py", line 198, in _worker_loop data = fetcher.fetch(index) File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data_utils\fetch.py", line 47, in fetch return self.collate_fn(data) File "D:\miniconda\envs\py38\lib\site-packages\changeable\dataloader.py", line 41, in call images = default_collate(images) File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data_utils\collate.py", line 63, in default_collate return default_collate([torch.as_tensor(b) for b in batch]) File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data_utils\collate.py", line 55, in default_collate return torch.stack(batch, 0, out=out) RuntimeError: result type Float can't be cast to the desired output type Byte

yatengLG commented 2 years ago

@liu-ai-z

我看了下你的报错,应该是dataset 的 getitem 中返回的image数据类型不是numpy.ndarray

dataset返回的getitem是这个。https://github.com/yatengLG/Changeable/blob/957c17a4aa0a02668ceb8c5046adb3626f322337/changeable/dataset.py#L44

其中 image是 https://github.com/yatengLG/Changeable/blob/957c17a4aa0a02668ceb8c5046adb3626f322337/changeable/dataset.py#L76

yatengLG commented 2 years ago

@liu-ai-z 你可以初始化你的dataset 然后 datesetgeiitem(0)拿出一个数据来,看看返回的具体是什么

hhamm commented 2 years ago

我后来解决啦!谢谢博主

发自我的iPhone

------------------ 原始邮件 ------------------ 发件人: yatengLG @.> 发送时间: 2021年11月12日 16:25 收件人: yatengLG/Changeable @.> 抄送: 想抱抱月亮 @.>, Author @.> 主题: Re: [yatengLG/Changeable] 博主,数据集能告诉我下吗! (#1)

Closed #1.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

liu-ai-z commented 2 years ago

@yatengLG 您好我的image输出是adarray格式 ,在使用您给的小demo时转化几张图片过后也会出现这个问题,包括使用num_work=1时也会出现。 捕获

liu-ai-z commented 2 years ago

我的数据集是自己的不是VOC数据集,但标注方式是一样的。

yatengLG commented 2 years ago

@liu-ai-z 你是不是跑到一半出问题了。

如果是,那就是数据存在问题, 你把dataloader里面的num_workers 改成1 。 然后去跑,看看是哪个数据有问题。

从你最开始贴的结果里面:

博主你好,我在使用的过程中发现,当num_works = 4时会报这样的错误
File "E:/yolox-trans/train.py", line 234, in
fit_one_epoch(model_train, model, yolo_loss, loss_history, optimizer, epoch,
File "E:\yolox-trans\utils\utils_fit.py", line 14, in fit_one_epoch
for iteration, batch in enumerate(gen):
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data\dataloader.py", line 435, in next
data = self._next_data()
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data\dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data\dataloader.py", line 1111, in _process_data
data.reraise()
File "D:\miniconda\envs\py38\lib\site-packages\torch_utils.py", line 428, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 3.

提示是在第三个进程里面出现问题了,所以我推断你这情况是跑一半出现的,是由于某个数据存在问题导致的。

在你贴的结果中,存在这一行

File "D:\miniconda\envs\py38\lib\site-packages\changeable\dataloader.py", line 41, in call
images = default_collate(images)

应该就是有一个图片数据存在问题,导致的报错。

liu-ai-z commented 2 years ago

from changeable.utils.display import draw_boxes, plot_image from changeable.dataset import VOCDataset from changeable.dataloader import dataloader from changeable.transforms import * from changeable.anchor import AnchorsAssignerWH,AnchorsGenerator if name == 'main':

with open('pcb_classes.txt', 'r')as f:      # 类别名文件,每行一个类别名
    lines = f.readlines()
    classes_name = tuple([line.rstrip('\n') for line in lines])

dataset = VOCDataset(root='PCB',        # voc数据集根目录
                     classes_name=classes_name,
                     is_train=True,
                     transforms=Compose([       # 这里添加了所有的数据增强方式,只做例子演示用。
                         Resize((300, 300)),
                         AdaptiveResize((300, 300)),
                         Scaled(1.1),
                         CropIou(0.5),
                         CropSize((300, 300)),
                         DivideStds((1,1,1)),
                         SubtractMeans((0,0,0)),
                         GaussNoise(),
                         SalePepperNoise(),
                         GaussBlur(),
                         MotionBlue(),
                         Cutout(),
                         RandomFlipLR(),
                         RandomFlipUD(),
                         ShuffleChannels(),
                         ChangeContrast(),
                         ChangeHue(),
                         ChangeBrightness(),
                         ChangeSaturation(),
                         ConvertBoxesToPercentage(),
                         ConvertBoxesToValue(),
                         ConvertBoxesForm('xyxy', 'cxcywh'),
                         ConvertBoxesForm('cxcywh', 'xyxy'),
                     ])
                     )

anchors = AnchorsGenerator(image_size=(600, 600),
                           feature_maps_size=((76, 76), (38, 38), (19, 19)),
                           anchors_size=(((10, 13), (16, 30), (33, 23)),
                                         ((30, 61), (62, 45), (59, 119)),
                                         ((116, 90), (156, 198), (373, 326))),
                           form='xyxy',
                           clip=True
                           )

anchors_assigner = AnchorsAssignerWH(anchors, 3)

loader = dataloader(dataset, batch_size=4, resize=(600, 600),  use_mosaic=True, anchors_assigner=anchors_assigner, shuffle=True, num_workers=4)

for i, (img, box, lab, ids) in enumerate(loader):
    print(i)
    print(img.size())
    print(box.size())
    print(lab.size())
    img, box, lab, id = img[0], box[0], lab[0], ids[0]
    img = img.permute((1, 2, 0)).numpy()
    box, lab = box.numpy(), lab.numpy()
    box, lab = box[lab>0], lab[lab>0]
    img = draw_boxes(img, box, lab, label_name=classes_name)
    plot_image(img)
liu-ai-z commented 2 years ago

使得总是在这一行出现问题

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2021年11月22日(星期一) 上午10:54 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [yatengLG/Changeable] 博主,数据集能告诉我下吗! (#1)

@liu-ai-z 你是不是跑到一半出问题了。

如果是,那就是数据存在问题, 你把dataloader里面的num_workers 改成1 。 然后去跑,看看是哪个数据有问题。

从你最开始贴的结果里面: 博主你好,我在使用的过程中发现,当num_works = 4时会报这样的错误 File "E:/yolox-trans/train.py", line 234, in fit_one_epoch(model_train, model, yolo_loss, loss_history, optimizer, epoch, File "E:\yolox-trans\utils\utils_fit.py", line 14, in fit_one_epoch for iteration, batch in enumerate(gen): File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data\dataloader.py", line 435, in next data = self._next_data() File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data\dataloader.py", line 1085, in _next_data return self._process_data(data) File "D:\miniconda\envs\py38\lib\site-packages\torch\utils\data\dataloader.py", line 1111, in _process_data data.reraise() File "D:\miniconda\envs\py38\lib\site-packages\torch_utils.py", line 428, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in DataLoader worker process 3.
提示是在第三个进程里面出现问题了,所以我推断你这情况是跑一半出现的,是由于某个数据存在问题导致的。

在你贴的结果中,存在这一行 File "D:\miniconda\envs\py38\lib\site-packages\changeable\dataloader.py", line 41, in call images = default_collate(images)
应该就是有一个图片数据存在问题,导致的报错。

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

yatengLG commented 2 years ago

快速排查哪个图片存在问题,你可以这么写:

## 将 shuffle =False, 不打乱数据顺序,num_workers = 1 使用一个进程去处理数据。
loader = dataloader(dataset, batch_size=4, resize=(600, 600),  use_mosaic=True, anchors_assigner=anchors_assigner, shuffle=False, num_workers=1)

for i, (img, box, lab, ids) in enumerate(loader):
    print(ids)

如果是没有问题的数据,就会打印图片名。 直到报错,然后去你数据集的那个训练.txt 找没打印的下一个数据,应该就是那张图片出问题了

liu-ai-z commented 2 years ago

好的我试一下

liu-ai-z commented 2 years ago

我试了以下发现是使用use_mosaic出现的情况,在不适用use_mosaic是没有任何问题的

yatengLG commented 2 years ago

@liu-ai-z 你可以发你的联系方式到 yatenglg,如果你需要的话,我可以线上远程帮你看看

liu-ai-z commented 2 years ago

联系方式已经发送到了您的邮箱