Open Zumbalamambo opened 11 months ago
Great point! We'd love to take a contribution for this. Any thoughts on what a good model to put in the examples is?
Maybe something like DETR ?
I think ViTDet can be a good candidate, especially due to its intuitive structure, which simplifies understanding and implementation. By the way, most detection/segmentation models are generally implemented on top of frameworks like Detectron2, mmdetection, or detrex. Implementing necessary helper functions for these detection models can be somewhat challenging.
@awni i would recommend YOLOv8 so it can cover ,
1. Detection
2. Segmentation
3. Keypoint Detection
4. Classification
It sounds like you're proposing the addition of a command-line option, "-e" or "--eval-threads," to specify thread counts for different types of evaluation tasks, specifically for single-token evaluation and prompt evaluation. This could be a handy addition, especially for optimizing performance in scenarios where different tasks might benefit from distinct thread allocations.
Implementing this option could potentially offer users more control and flexibility in managing computational resources based on the specific evaluation requirements. It might streamline and enhance the efficiency of concurrent evaluations, particularly in situations where you're handling varied types of tasks simultaneously.
If this feature gets integrated, users could more precisely allocate computational resources for different evaluation types, potentially optimizing processing speed and overall performance based on the nature of the tasks being executed.
What's the context or platform for which you're considering this addition? The specific application or system where this enhancement is being proposed might help in understanding its potential impact and usefulness.
As part of my +1 for YOLOv8 I can say that a CreateML object detection training task on my M1 Mac took 18+ hrs and the same dataset completed w/ YOLOv8 Google Colab in under 45 mins. I don't expect my Mac to perform as well as cloud hardware with 4 GPUs attached, but I'd like to be able to do more local training and if this project could shave down that training time, I'd be thrilled.
DINO_DETR_MLX : Here is the port of the DINO DETR model for object detection in MLX. API to load pre-trained PyTorch model weights, training/fine-tuning and evaluation using COCO API. This implementation uses Data Loader from torchvision.datasets and also provides a simple custom data loader. Also added a synthetic dataset to run profiler for time/memory cost analysis without the need to download COCO dataset.
Please feel free to open an issue / pull request or start a discussion.
It would be more useful to have a object detection model since most of the computer vision modules are directed to NVIDIA.