# Generator evaluator selector modular net for panoptic image segmentation This code for the paper Generator evaluator-selector net: a modular approach for panoptic segmentation. To download the same code with trained models weights (ready to run )see these links 1 2.
The system is composed of a Generator (Pointer Net) that generate segments, evaluator (Evaluation net) that rank and select segments to create category independent segmentation map (Figure 1), and a segment classification net that classifies the selected segments. The nets and the weights were trained and tested on the COCO panoptic data set.
See the tutorial section for running/training instructions and the description section for more details on the system.
The system was run and trained using Python Anacoda 3.7 with pytorch 1.01, and opencv on single Titan XP GPU.
This will generate full annotation to all the images in the input folder. Two subfolders will be generated in the output folder: The ‘FinalPredictionsVizual’ folder will contain the predicted annotation for visualization The ‘COCO2Channels’ folder will contain the predicted annotation in a format that can be converted to COCO panoptic standard formats (See COCOConvertor RunConvertEval.py)
Each of the subfolders PointerSegmentation/Evaluation/Classification/Refinement
contains a “TRAIN.py” script.
Running this “TRAIN.py” should train the net.
All training scripts can be run without change with the sample data included in ‘SampleData’ folder.
To train net with real data you need to first generate training data, see Generating data for training section.
Each of the subfolders PointerSegmentation/Evaluation/Classification/Refinement
contains a “Evaluate.py” script.
Running this “Evaluate.py” should generate evaluation statistics .
All evaluation scripts can be run without change with the sample data included.
To evaluate the net with real data, you need to first generate data.
See Generating Data section for more instruction.
Three subfolders will be generated in the output dir (OutDir)
These scripts takes the all the trained pointer nets models and use them on the Pointer Training net data to generate predicted segmentation mask. The output folder contain subfolders “Pred” and “GT” which contain the predicted segment mask and the matching Ground Truth segment masks. The segments file names contain the name of the image used to generate the masks, the mask category ID (COCO), and the IOU between the predicted and GT segments. These folders should be used as inputs for the Evaluation/Classification/Refinement nets (after the data pass the cleaning process)
The pairs of predicted and GT segments generated earlier, might not correspond to each other. Hence, the predicted segment might match a different ground truth segment then the one assign to it. In this case the cleaner will add the ‘WRONG’ to the file name else it will add ‘V’ readers of the evaluation/classification/refinement nets will not use files that have ‘WRONG’ in their names.
For detail description see Generator evaluator-selector net: a modular approach for panoptic segmentation.
A schematic for the full modular system is shown in Figure 1. The method is comprised of four independent networks combined into one modular structure. The first step is generating several different segments using the pointer net. The segments generated by this net, are restricted to a given region of interest (ROI) which covers the unsegmented image region. The generated segments are then ranked by the evaluator net. This net assigned each segment a score that estimate how well it corresponds to a real segment in the image. The segments which receive the highest scores and are consistent with each other are selected, while low-ranking segments are filtered out. The selected segments are then polished using the refinement net. Each of the selected segments is then classified using the classifier net. Finally, the selected segments are stitched into the segmentation map (Figure 1). The segmentation map is passed to the next cycle which repeats the process in the remaining unsegmented image regions. The process is repeated until either the full image has been segmented or the quality assigned to all of the predicted segments by the evaluator drop below some threshold.
Pointer net act as the segment generator, which creates proposals for different segments in the image (Figure 2). Pointer net receives an image and a point within this image. The net predicts the mask of the segment that contains the input point (Figure 2). In this work, the pointer point location is chosen randomly within the unsegmented region of the image. The net will predict different segments for different input points, even if the points are located within the same segment (Figure 2). While this feature was not planned, it allows pointer net to act as a random segment generator with the ability to generate a large variability of segments by selecting random input points. Another input of the pointer net is a region of interest (ROI) mask which restricts the region of the predicted segments. The generated output segment region will be confined to the ROI mask. This property prevents newly generated segments from overlapping previously generated segments. In this work, the ROI mask is simply the unsegmented region of the image.
The evaluator net is used to check and rank the generated segments. The ranking is done according to how well the input segment fits the best matching real segments in the image. The evaluator net is a simple convolutional net that receives two inputs: an image and a generated segment mask (Figure 2d). The evaluator net predicts the intersection over union (IOU) between the input segment and the closest real segment in the image.
Refinement net is used to polish the boundaries of the generated segment. The net receives the image and an imperfect segment mask. The net output is a refined version of the input segment (Figure 2e). This approach has been examined in several previous works.
Determining the segment category is done using a region-specific classification net. The net receives the image and a segment mask. The net predicts the category of the input segment (Figure 2f). This approach has been explored in previous works.