spiking_vgg.py [ line 132] Input Shape

SoikatHasanAhmed commented 1 year ago

I got confused about the input shape of the network (e.g., vgg for object detection task)

as per the documentation the spikingjelly

functional.seq_to_ann_forward(.)

the function takes inputs as

[ T, B, C, H, W]

however, in the code, the input shape seems to be

[ B, T, C, H, W]

Please let me know if it is correct or if I am missing anything. TIA

loiccordone commented 1 year ago

Hello, both your remarks are indeed correct, the order [T,B,C,H,W] is now the norm for SpikingJelly, while in my code it is [B,T,C,H,W]. However it doesn't make a difference, since what SpikingJelly is doing is concatenating the first two dimensions to treat them as the batch size dimensions, enabling the computing of all timesteps (and all batch samples) at the same time. The output is then reshaped to the initial dimensions, (B,T,C,H,W) in my code.

SoikatHasanAhmed commented 1 year ago

Thank you for your kind response. I tried both ways. I would be happy to get your feedback and feeling about the different test results I got [after 20 epochs, the batch size: 32]. TIA

Without changing any code, the test result of vgg-11 backbone


Accumulating evaluation results...
DONE (t=13.11s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.235
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.475
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.205
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.112
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.276
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.090
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.217
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.405
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.421
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.304
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.454
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.218
--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'val_AP_IoU=.5': 0.474841445684433,
 'val_AP_IoU=.5:.05:.95': 0.23526059091091156,
 'val_AP_IoU=.75': 0.2049378901720047,
 'val_AP_large': 0.08983469754457474,
 'val_AP_medium': 0.27555549144744873,
 'val_AP_small': 0.1122187077999115,
 'val_AR_det=1': 0.2173333764076233,
 'val_AR_det=10': 0.405019611120224,
 'val_AR_det=100': 0.42143917083740234,
 'val_AR_large': 0.21824853122234344,
 'val_AR_medium': 0.45443394780158997,
 'val_AR_small': 0.3043917119503021}
--------------------------------------------------------------------------------

After permuting the input shape to [ T, B, C, H, W]:

Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.001
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.001
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.003
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.021
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.040
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.022
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.045
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.035
--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'val_AP_IoU=.5': 0.001488955575041473,
 'val_AP_IoU=.5:.05:.95': 0.0003149840049445629,
 'val_AP_IoU=.75': 3.0108682040008716e-05,
 'val_AP_large': 0.00043100581387989223,
 'val_AP_medium': 0.0007049509440548718,
 'val_AP_small': 0.00012032297672703862,
 'val_AR_det=1': 0.0029605242889374495,
 'val_AR_det=10': 0.021433377638459206,
 'val_AR_det=100': 0.03991447389125824,
 'val_AR_large': 0.03532028570771217,
 'val_AR_medium': 0.04526446387171745,
 'val_AR_small': 0.022240184247493744}

loiccordone commented 1 year ago

Hello, nice results! When switching the dimensions you also have to modify some code: https://github.com/loiccordone/object-detection-with-spiking-neural-networks/blob/118acf3069a4a2f03d76432b005ef7a1c44ac47e/models/detection_backbone.py#L44-L48

You have to sum along the first dimension dim=0 instead of dim=1. Maybe some other modifications elsewhere need to be done, tell me if you still encounter problems!

SoikatHasanAhmed commented 1 year ago

Thanks, I missed it. I will test it and let you know.

SoikatHasanAhmed commented 1 year ago

I got similar results Thanks

loiccordone / object-detection-with-spiking-neural-networks

spiking_vgg.py [ line 132] Input Shape #17