Implementation of Object Detection using dataset in XML format.

kaustubhharapanahalli commented 4 years ago

Hey @waleedka,

I am trying to use this code to build an object detection model. My data format is in XML and I am parsing through it to get the required details. I am facing issue in understanding how to load the bounding box parameter for object detection. Since it is a rectangle box, I have two parameters with me which are x_min, y_min, x_max and y_max.

My XML file is in this format:

<annotation>
    <folder>Downloads</folder>
    <filename>00000(3)_x264_10000.jpg</filename>
    <path>/home/deep/Downloads/00000(3)_x264_10000.jpg</path>
    <source>
        <database>Unknown</database>
    </source>
    <size>
        <width>1920</width>
        <height>1080</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>car</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>139</xmin>
            <ymin>662</ymin>
            <xmax>455</xmax>
            <ymax>832</ymax>
        </bndbox>
    </object> ...

My code for loading the dataset is this:

def load_dataset(self, dataset_dir, subset):
        """
        Loading the dataset from the xml files that is present.
        """
        for index in range(len(CLASSES)):
            self.add_class("dataset", index+1, CLASSES[index])

        assert subset in ["train", "val"]
        dataset_dir = os.path.join(dataset_dir, subset)

        annotations_dir = os.path.join(dataset_dir, "annotations")
        image_dir = os.path.join(dataset_dir, "images")

        annotation_files = os.listdir(annotations_dir)

        for xml_file in annotation_files:
            parser = etree.XMLParser(encoding="utf-8")
            tree = ElementTree.parse(annotations_dir + "/" + xml_file, parser=parser).getroot()
            filename = tree.find("filename").text
            path = tree.find("path").text
            width = int(tree.find("size").find("width").text)
            height = int(tree.find("size").find("height").text)
            image_path = os.path.join(image_dir, xml_file[:-4] + ".png")

            polygons = []
            class_ids = []

            for object_iter in tree.findall("object"):

                x_min, y_min, x_max, y_max = get_bbox(object_iter)
                polygons.append("all_points_x":[x_min, ])
                class_ids.append(CLASSES.index(object_iter.find("name").text)+1)

            image_name = tree.find("filename").text 
            print(image_name)
            self.add_image(
                "dataset",
                image_id=image_name,
                path=image_path,
                width=width, 
                height=height,
                polygons=polygons,
                class_ids=class_ids
                )

Can you please help me fix this? Thanks in advance.

adions025 commented 4 years ago

Maybe you need a converter: https://github.com/adions025/XMLtoJson_Mask_RCNN

changhe14 commented 1 year ago

Have u solved this, I really need your help, thank u

matterport / Mask_RCNN

Implementation of Object Detection using dataset in XML format. #1872