COCO evaluator is being passed partial data

jaredgizersky commented 1 year ago

🐛 Describe the bug

I was roughly following this tutorial to implement MaskRCNN with a custom dataset, but I came across the following error when using the evaluate function from references/engine, which persisted even when I used the PennFundan dataset and ran the original tutorial code as well.

File c:\Users\jmax0\RemosLyme\extras\coco_eval.py:35, in CocoEvaluator.update(self, predictions)
     33 results = self.prepare(predictions, iou_type)
     34 with redirect_stdout(io.StringIO()):
---> 35     coco_dt = COCO.loadRes(self.coco_gt, results) if results else COCO()
     36 coco_eval = self.coco_eval[iou_type]
     38 coco_eval.cocoDt = coco_dt
AssertionError: Results do not correspond to current coco set

After investigating a bit further, I found that despite using a subset of the data, the get_coco_api_from_dataset function creates a CocoEvaluator object with the full dataset. However, the engine attempts to update the CocoEvaluator one by one with each batch returned from the dataloader, causing it to return the above error because the image_ids from the batch don't constitute the full list of image_ids from the dataset. Therefore, is the evaluator meant to expect all objects at once, and if so, is there a way to create it based off a subset?

Versions

PyTorch version: 2.0.1 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: macOS 13.4.1 (arm64) GCC version: Could not collect Clang version: 15.0.0 (clang-1500.0.34.3) CMake version: version 3.26.3 Libc version: N/A

Python version: 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 18:08:41) [Clang 15.0.7 ] (64-bit runtime) Python platform: macOS-13.4.1-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Apple M1 Pro

Versions of relevant libraries: [pip3] mypy-extensions==1.0.0 [pip3] numpy==1.25.0 [pip3] torch==2.0.1 [pip3] torchvision==0.15.2a0 [conda] numpy 1.25.0 py311he598dae_0 defaults [conda] numpy-base 1.25.0 py311hfbfe69c_0 defaults [conda] pytorch 2.0.1 cpu_py311h10ecaf1_0 defaults [conda] torchvision 0.15.2 cpu_py311h88737c0_1 conda-forge

cc @pmeier

jaredgizersky commented 1 year ago

With respect to the CocoEvaluator containing the full dataset instead of a subset, I found this, which for a Subset is calling .dataset despite the fact that in the engine it already calls .dataset as well. I'm not sure it is intended for the Coco object to have the full dataset, especially given the fixme comment and strange for loop in that function.

pmeier commented 1 year ago

Hey @jaredgizersky. I'm sorry, but I have trouble following what you actually did. My understanding is that you built a custom dataset for detection and want to use our reference utilities with it? If so, how does your dataset look like? And what utilities are you using and how are you calling them?

jaredgizersky commented 1 year ago

Hi @pmeier. Sorry for the confusion. My dataset followed the same structure as described by pytorch. The utilities I was using were the train_one_epoch and evaluate functions, but as described earlier I experienced errors with the evaluate function. I ended up figuring out that the problem was that the CocoEvaluator.img_ids from coco_eval.py contained the integer ids, but in the evaluate function it was trying to use the tensor image ids, causing a conflict. According to this, the image id should initially be a tensor, and I resolved the issue by adding a .item call to the following line of engine.py res = {target["image_id"].item(): output for target, output in zip(targets, outputs)}

Additionally, as described earlier, I found that when passing a Subset to the evaluate function, the resulting COCO from get_coco_api_from_dataset would contain the full dataset instead of the Subset. I'm not sure if this is intended behavior.

muhammadmhmd commented 11 months ago

Hi @pmeier. Sorry for the confusion. My dataset followed the same structure as described by pytorch. The utilities I was using were the train_one_epoch and evaluate functions, but as described earlier I experienced errors with the evaluate function. I ended up figuring out that the problem was that the CocoEvaluator.img_ids from coco_eval.py contained the integer ids, but in the evaluate function it was trying to use the tensor image ids, causing a conflict. According to this, the image id should initially be a tensor, and I resolved the issue by adding a .item call to the following line of engine.py res = {target["image_id"].item(): output for target, output in zip(targets, outputs)}

Additionally, as described earlier, I found that when passing a Subset to the evaluate function, the resulting COCO from get_coco_api_from_dataset would contain the full dataset instead of the Subset. I'm not sure if this is intended behavior.

Hey, I get same issue. I tried follow your method by adding .item() and still got the error 'Results do not correspond to current coco set', when i tried to debug the set(annsImgIds) == (set(annsImgIds) & set(self.getImgIds())) in the COCO.loadRes, the original result for set(annsImgIds) is {tensor([0]), tensor([1])} and result for set(self.getImgIds()) is {tensor([455]), tensor([684]),..,tensor([1105])} with the length of my data test. but when i add .item() the result for set(annsImgIds) is {0, 1} and the output set(self.getImgIds()) still the same. I think i still have a problem with missmatch, thank you

pytorch / vision

COCO evaluator is being passed partial data #7817

🐛 Describe the bug

Versions