FCOS - possible issue in location computation - not taking stride into account

ksachdeva commented 1 year ago

🐛 Describe the bug

The location computation (or rather its usage) in the implementation of FCOS seems to have an issue (based on how I have understood the paper and after seeing few other open source implementations).

Link to the colab notebook to show the problem in detail https://colab.research.google.com/drive/140LWiY2h5ozfYTT7Uc-KfrNML1z_nsdj?usp=sharing

Here is the brief description:

https://github.com/pytorch/vision/blob/0387b8821d67ca62d57e3b228ade45371c0af79d/torchvision/models/detection/fcos.py#L453

The current implementation is computing so called anchor_centers at the above line and then using it to compute the offsets as well the for creating masks.

The above computation is not considering the stride component as specified in the paper (See section 2.1)

Instead if you use this, it would be the correct implementation (as per my understanding of the paper)

anchor_centers = anchors_per_image[:, 2:]

Versions

Collecting environment information... PyTorch version: 1.13.1+cu116 Is debug build: False CUDA used to build PyTorch: 11.6 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: 10.0.0-4ubuntu1 CMake version: version 3.25.2 Libc version: glibc-2.31

Python version: 3.9.16 (main, Dec 7 2022, 01:11:51) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.10.147+-x86_64-with-glibc2.31 Is CUDA available: False CUDA runtime version: 11.8.89 CUDA_MODULE_LOADING set to: N/A GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.7.0 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 2 Core(s) per socket: 1 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU @ 2.20GHz Stepping: 0 CPU MHz: 2199.998 BogoMIPS: 4399.99 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32 KiB L1i cache: 32 KiB L2 cache: 256 KiB L3 cache: 55 MiB NUMA node0 CPU(s): 0,1 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Mitigation; PTE Inversion Vulnerability Mds: Vulnerable; SMT Host state unknown Vulnerability Meltdown: Vulnerable Vulnerability Mmio stale data: Vulnerable Vulnerability Retbleed: Vulnerable Vulnerability Spec store bypass: Vulnerable Vulnerability Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers Vulnerability Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Vulnerable Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities

Versions of relevant libraries: [pip3] numpy==1.22.4 [pip3] torch==1.13.1+cu116 [pip3] torchaudio==0.13.1+cu116 [pip3] torchsummary==1.5.1 [pip3] torchtext==0.14.1 [pip3] torchvision==0.14.1+cu116 [conda] Could not collect

NicolasHug commented 1 year ago

Hi @ksachdeva , thanks for the detailed report.

I'm not sure I completely follow everything yet, but I think the line you're referring to is similar to these ones, from the original FCOS implementation (I think they use XXYY format rather than XYXY as in torchvision):

https://github.com/tianzhi0549/FCOS/blob/0eb8ee0b7114a3ca42ad96cd89e0ac63a205461e/fcos_core/modeling/rpn/fcos/loss.py#L64-L65

anchor_centers = anchors_per_image[:, 2:]

Since the format is XYXY, this would only give you the first XY coordinate, which doesn't correspond to the center of the anchor. The way it's done in torchvision looks more correct to me.

Looping in @xiaohu2015 as you're the original contributor for FCOS, in case you have any input on this :) !

xiaohu2015 commented 1 year ago

@ksachdeva @NicolasHug hi, here we chose the upper point as the center of the grid cell, not add the s/2, it does not have impact on the model performance, as Faster RCNN and RetinaNet all do that. In detectron2, you can set _C.MODEL.ANCHOR_GENERATOR.OFFSET=0.5, but for torchvision AnchorGenerator, it does not supppt the offset parameter, so we chose use the default settting of AnchorGenerator.

https://github.com/facebookresearch/detectron2/blob/main/detectron2/config/defaults.py#L193

Relative offset between the center of the first anchor and the top-left corner of the image Value has to be in [0, 1). Recommend to use 0.5, which means half stride. The value is not expected to affect model accuracy. _C.MODEL.ANCHOR_GENERATOR.OFFSET = 0.0

as for your code: https://colab.research.google.com/drive/140LWiY2h5ozfYTT7Uc-KfrNML1z_nsdj?usp=sharing anchor_centers = anchors_per_image[:, 2:]

it is not correct, here your anchor_size=32, it just a coincide. If you set anchor_size = 64, it will be inncorrect.

xiaohu2015 commented 1 year ago

@ksachdeva Maybe you can give a PR to make AnchorGenerator supppt the offset parameter?

ksachdeva commented 1 year ago

Thanks @xiaohu2015 for the clarification. I acknowledge that it should not have a big impact on the model performance.

Regarding the changes to AnchorGenerator - IMHO, the usage of AnchorGenerator in FCOS is not optimal (both from readability and maintainability aspects) and as you can already see from other issues (https://github.com/pytorch/vision/issues/6429) in this repository it has added to confusion for other users. It is not a good idea to repurpose classes/utilities just because they have portions of it that can be reused.

xiaohu2015 commented 1 year ago

@ksachdeva yeah, the AnchorGenerator in FCOS is confusing, but I found it is more simple to implement by using the AnchorGenerator

Isalia20 commented 2 months ago

@xiaohu2015 Can I submit a PR which removes AnchorGenerator from FCOS?

pytorch / vision

FCOS - possible issue in location computation - not taking stride into account #7473

🐛 Describe the bug

Versions