Coco_eval function not working on certain dataset versions

🐛 Describe the bug

Okay so I am working with roboflow and I am using the version coco.json, I am working on a DETR model so I trained the model on thermal images ( without colors ) and then I wanted to test on some optical images (rgb) when I show the image of detections on my trained model the bounding box is there and the detections are good even when the model is not trained on optical images. The problem is when I call the evaluator.update(predictions) I get an error

Traceback (most recent call last):
  File "D:\Projects\test2.py", line 191, in <module>
    evaluator.update(predictions)
  File "D:\Projects\venv\lib\site-packages\coco_eval\coco_eval.py", line 50, in update
    img_ids, eval_imgs = evaluate(coco_eval)
  File "D:\Projects\venv\lib\site-packages\coco_eval\coco_eval.py", line 126, in evaluate
    self._prepare()
  File "D:\Projects\venv\lib\site-packages\pycocotools\cocoeval.py", line 97, in _prepare
    dts=self.cocoDt.loadAnns(self.cocoDt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds))
  File "D:\Projects\venv\lib\site-packages\pycocotools\coco.py", line 146, in getAnnIds
    anns = self.dataset['annotations']
KeyError: 'annotations'"

I assure you there is no problem with the json files, I tried working with preprocessed images "Grayscale" it worked once but other versions even with "grayscale" generate the same problem as with rbg model, I think there is a problem with the coco_eval function. this is the code

with torch.no_grad():

    # load image and predict
    inputs = image_processor(images=image, return_tensors='pt').to(device)
    model = model.to(inputs.pixel_values.device)
    outputs = model(**inputs)

    # post-process
    target_sizes = torch.tensor([image.shape[:2]]).to(device)
    results = image_processor.post_process_object_detection(
        outputs=outputs,
        threshold=CONFIDENCE_TRESHOLD,
        target_sizes=target_sizes
    )[0]

# annotate
detections = sv.Detections.from_transformers(transformers_results=results)
if len(detections) > 0:
    # There are detections. Apply NMS.
    detections = detections.with_nms(threshold=0.5)
    labels = [f"{id2label[class_id]} {confidence:.2f}" for _, confidence, class_id, _ in detections]
    frame = box_annotator.annotate(scene=image.copy(), detections=detections, labels=labels)
    print('detections')
    sv.show_frame_in_notebook(frame, (16, 16))
else:
    # There are no detections. Print "No Predictions".
    print("No Predictions")

from coco_eval import CocoEvaluator
from tqdm.notebook import tqdm

def convert_to_xywh(boxes):
    xmin, ymin, xmax, ymax = boxes.unbind(1)
    return torch.stack((xmin, ymin, xmax - xmin, ymax - ymin), dim=1)

def prepare_for_coco_detection(predictions):
    coco_results = []
    for original_id, prediction in predictions.items():
        if len(prediction) == 0:
            continue

        boxes = prediction["boxes"]
        boxes = convert_to_xywh(boxes).tolist()
        scores = prediction["scores"].tolist()
        labels = prediction["labels"].tolist()

        coco_results.extend(
            [
                {
                    "image_id": original_id,
                    "category_id": labels[k],
                    "bbox": box,
                    "score": scores[k],
                }
                for k, box in enumerate(boxes)
            ]
        )
    return coco_results

import numpy as np
def collate_fn(batch):
    pixel_values = [item[0] for item in batch]
    encoding = image_processor.pad(pixel_values, return_tensors="pt")
    labels = [item[1] for item in batch]
    return {
        'pixel_values': encoding['pixel_values'],
        'pixel_mask': encoding['pixel_mask'],
        'labels': labels
    }
TEST_DATALOADER = DataLoader(dataset=TEST_DATASET, collate_fn=collate_fn, batch_size=4)

evaluator = CocoEvaluator(coco_gt=TEST_DATASET.coco, iou_types=["bbox"])

print("Running evaluation...")

for idx, batch in enumerate(tqdm(TEST_DATALOADER)):
    pixel_values = batch["pixel_values"].to(device)
    pixel_mask = batch["pixel_mask"].to(device)
    labels = [{k: v.to(device) for k, v in t.items()} for t in batch["labels"]]

    with torch.no_grad():
      outputs = model(pixel_values=pixel_values, pixel_mask=pixel_mask)

    orig_target_sizes = torch.stack([target["orig_size"] for target in labels], dim=0)
    results = image_processor.post_process_object_detection(outputs, target_sizes=orig_target_sizes)

    predictions = {target['image_id'].item(): output for target, output in zip(labels, results)}
    predictions = prepare_for_coco_detection(predictions)
    evaluator.update(predictions)

evaluator.synchronize_between_processes()
evaluator.accumulate()
evaluator.summarize()

Versions

StatusCode        : 200
StatusDescription : OK
Content           :
                    # Unlike the rest of the PyTorch this file must be python2 compliant.
                    # This script outputs relevant system environment info
                    # Run it with `python collect_env.py`.
                    import datetime
                    import locale
                    impor...
RawContent        : HTTP/1.1 200 OK
                    Connection: keep-alive
                    Content-Security-Policy: default-src 'none'; style-src 'unsafe-inline'; sandbox
                    Strict-Transport-Security: max-age=31536000
                    X-Content-Type-Options: nosniff
                    ...
Forms             : {}
Headers           : {[Connection, keep-alive], [Content-Security-Policy, default-src 'none'; style-src 'unsafe-inline'; sandbox], [Strict-Transport-Security, max-age=31536000],    
                    [X-Content-Type-Options, nosniff]...}
Images            : {}
InputFields       : {}
Links             : {}
ParsedHtml        : System.__ComObject
RawContentLength  : 21653

pytorch / vision

Coco_eval function not working on certain dataset versions #7891

🐛 Describe the bug

Versions