jaspersiebring / GeoCOCO

Tool for converting GIS annotations to Microsoft's Common Objects In Context (COCO) datasets
https://geococo.readthedocs.io/
GNU General Public License v3.0
6 stars 1 forks source link

Added (super)category mapping and input data validation #6

Closed jaspersiebring closed 1 year ago

jaspersiebring commented 1 year ago

Users can now provide keys for columns that contain (super)category names or ids. Ids are autogenerated starting from last known id (incremental sequence) if only names are given. The values associated with these keys and the geometries themselves are validated through Pandera.

This did require a downgrade to Pydantic 1.10.12 and some refactoring of Pydantic V2 specific code (e.g. manual fowarding of model names, dropping of InstanceOf) as Pandera does not support v2 (yet). Still worth it though, considering how it simplifies data validation (and we'll upgrade as soon as they support v2).

If the input data passes validation, the values will be used to add new Category instances (if any) to the dataset. These are then used to map any existing category information to the new annotations (if matched). This gives users the option to start a COCO dataset from pretty much any shapefile with annotations as long as it includes some reasonable identifier (i.e. not necessarily COCO-specific keys). New annotations can be added whenever and geococo will update the meta data accordingly.

Users can also manually update the category names in their dataset to something a bit more expressive (particularly useful if the dataset was started from a shapefile that only contained category ids). Any subsequent annotations with the same id will then use this new name.