There is a potential bug in the given example, regarding the regular expression for file matching. This could lead to mismatching between image filenames and annotation filenames, which may cause serious confusion in the final COCO-format result.
For example, if there are two images (1.jpg and 1000.jpg). The regular expression produced when finding the annotation files for 1.jpg will be 1.*. Unfortunately, this expression also includes annotation files corresponding to 1000.png
Implemented fix
Changed the regular expression from basename.* to baseline_.* successfully solved the problem, as long as the user sticks to the naming convention basename_classname_instanceID for annotation files.
Note
this bug does not affect the original example, but users modifying the example for their custom dataset may trigger it and suffer from that. At least, it took me considerable time to figure that out. :)
Description
There is a potential bug in the given example, regarding the regular expression for file matching. This could lead to mismatching between image filenames and annotation filenames, which may cause serious confusion in the final COCO-format result.
Details
https://github.com/waspinator/pycococreator/blob/207b4fa8bbaae22ebcdeb3bbf00b724498e026a7/examples/shapes/shapes_to_coco.py#L63 The line above produces a regular expression for file matching image file with its annotation files. However, it might fail under certain circumstances.
For example, if there are two images (1.jpg and 1000.jpg). The regular expression produced when finding the annotation files for 1.jpg will be
1.*
. Unfortunately, this expression also includes annotation files corresponding to 1000.pngImplemented fix
Changed the regular expression from
basename.*
tobaseline_.*
successfully solved the problem, as long as the user sticks to the naming conventionbasename_classname_instanceID
for annotation files.Note
this bug does not affect the original example, but users modifying the example for their custom dataset may trigger it and suffer from that. At least, it took me considerable time to figure that out. :)