qfgaohao / pytorch-ssd

MobileNetV1, MobileNetV2, VGG based SSD/SSD-lite implementation in Pytorch 1.0 / Pytorch 0.4. Out-of-box support for retraining on Open Images dataset. ONNX and Caffe2 support. Experiment Ideas like CoordConv.
https://medium.com/@smallfishbigsea/understand-ssd-and-implement-your-own-caa3232cd6ad
MIT License
1.39k stars 530 forks source link

input train image background #166

Closed eltonfernando closed 2 weeks ago

eltonfernando commented 2 years ago

For yolo templates I can put images without objects by adding an empty annotation file. Is there any way to do this with ssd or mobilinetv1?

eltonfernando commented 2 weeks ago

in vision/datasets/voc_dataset.py create new method

def _gen_random_background(self):
 return (np.array([[10, 30, 40, 50]], dtype=np.float32), np.array([0], dtype=np.int64), np.array([0], dtype=np.uint8))

change method def _get_annotation(self, image_id): loaed anotation, before return add

if len(boxes) == 0:
  return self._gen_random_background()

this should be the result

 def _get_annotation(self, image_id):
        annotation_file = self.root / f"Annotations/{image_id}.xml"
        objects = ET.parse(annotation_file).findall("object")
        if len(objects) == 0:
            return self._gen_random_background()
            raise Exception(f"xml sem box {annotation_file}")
        boxes = []
        labels = []
        is_difficult = []
        for object in objects:
            class_name = object.find("name").text.lower().strip()
            # we're only concerned with clases in our list
            if class_name in self.class_dict:
                bbox = object.find("bndbox")

                # VOC dataset format follows Matlab, in which indexes start from 0
                x1 = float(bbox.find("xmin").text) - 1
                y1 = float(bbox.find("ymin").text) - 1
                x2 = float(bbox.find("xmax").text) - 1
                y2 = float(bbox.find("ymax").text) - 1
                boxes.append([x1, y1, x2, y2])

                labels.append(self.class_dict[class_name])
                is_difficult_str = object.find("difficult").text
                is_difficult.append(int(is_difficult_str) if is_difficult_str else 0)

        if len(boxes) == 0:
            return self._gen_random_background()
        return (np.array(boxes, dtype=np.float32), np.array(labels, dtype=np.int64), np.array(is_difficult, dtype=np.uint8))

    def _gen_random_background(self):
        return (np.array([[10, 30, 40, 50]], dtype=np.float32), np.array([0], dtype=np.int64), np.array([0], dtype=np.uint8))

This works for training with background image, let me know if your model improved with this