Two-shot pipeline - Githubissues

Description

Implement two-shot pipeline, where the first shot detects bounding boxes, and the second shot provides all classifications.
The first model will be YOLOv8n with low threshold.
The second model will be a custom CNN model, possibly implementing something like U-NET.

Currently, shape classification performance is high, but detection is low. By reducing number of classes in the first model, we should be able to improve the performance. Also, we can lower the threshold of the first model to increase recall.
Since the classification tasks are relatively overlapping—the color classifier has to somewhat segment the image at some point—we can combine all the classifiers into the same model, which should improve efficiency while allowing us to use a bigger model.