waspinator / pycococreator

Helper functions to create COCO datasets
Apache License 2.0
765 stars 179 forks source link

Fix bug in example (miss-leading under certain circumstance) #21

Open jianxiongcai opened 5 years ago

jianxiongcai commented 5 years ago

Description

There is a potential bug in the given example, regarding the regular expression for file matching. This could lead to mismatching between image filenames and annotation filenames, which may cause serious confusion in the final COCO-format result.

Details

https://github.com/waspinator/pycococreator/blob/207b4fa8bbaae22ebcdeb3bbf00b724498e026a7/examples/shapes/shapes_to_coco.py#L63 The line above produces a regular expression for file matching image file with its annotation files. However, it might fail under certain circumstances.

For example, if there are two images (1.jpg and 1000.jpg). The regular expression produced when finding the annotation files for 1.jpg will be 1.*. Unfortunately, this expression also includes annotation files corresponding to 1000.png

Implemented fix

Changed the regular expression from basename.* to baseline_.* successfully solved the problem, as long as the user sticks to the naming convention basename_classname_instanceID for annotation files.

Note

this bug does not affect the original example, but users modifying the example for their custom dataset may trigger it and suffer from that. At least, it took me considerable time to figure that out. :)