About features to add [Discussion]

Hello all,
I am actively using this repo in my project and instead of creating something new I want to add more features to this awesome repo. This issue is basically a discussion regarding these features. I would love to hear your opinions about it since you guys have lot more experience.

Feature 1: Allow users to select a model trained on either 256x256 or 512x512 images.

Why: The existing version of this repository has the trained model resizing images to a 512x512 resolution for segmentation. Since the models are trained on images of this resolution, users are bound to this specific size. However, there's a notable difference in inference times when processing images of 512x512 versus 256x256. My preliminary observations indicate:

Models trained on a 256x256 resolution run in 0.04 seconds on a GPU and 0.3 seconds on a CPU.
Models trained on a 512x512 resolution take 0.15 seconds on a GPU and 3 seconds on a CPU.

Moreover, many head segmentation tasks primarily work with face-detected cropped regions. This implies that even if the original image has a larger resolution, the focus (head) region remains smaller. Supporting this point, the dataset utilized here, celebAHQ, exclusively features head regions. If a user were to feed a full-body image, the model might struggle to segment the head correctly. I anticipate that in many scenarios, users will pre-crop the head region, and a 256x256 resolution should suffice. Nonetheless, rather than restricting users to a model trained on this resolution, we could provide an option. This way, they can pick the model that aligns best with their balance of performance and accuracy requirements.

Feature 2: Introduction of Broad Categories like Hair, Face, and Neck

Rationale: In my project, distinguishing between the face, hair, and neck regions is a requirement, and I believe this differentiation would be beneficial for various applications. While numerous face parsers exist, many are crafted for intricate tasks, such as segmenting specific facial features like the left eye, right eyebrow, or lower lip. These highly detailed specifications increase the model's complexity, rendering them more challenging to employ and often slower in performance. For instance, in some scenarios, I need to isolate the neck region to seamlessly integrate heads into a background. In others, I need the hair portion to extract it from background imagery for head swaps. I've yet to come across a repository that's both generic and user-friendly in this manner.

Feature 3: Enhanced Benchmarks and Optimization Efforts

Rationale: I'm keen on exploring avenues to further refine the segmentation performance, particularly concerning inference time. I have a VM equipped with a T4 GPU at my disposal, and I'm enthusiastic about training models with diverse encoders, such as EfficientNet, ResNet18, and Xception. Also preforming quantizaition and pruning on these models and include them inthe benchmarks

Feature 4: Support for PyTorch Lightning v2.0

Rationale: Embracing contemporary advancements is pivotal. My aim is to ensure the repository remains compatible with PyTorch Lightning v2.0. The codebase here is remarkably streamlined. While I'll strive to maintain this clarity, I anticipate requiring feedback on this front.

wiktorlazarski / head-segmentation

About features to add [Discussion] #36