huggingface / pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
https://huggingface.co/docs/timm
Apache License 2.0
32.18k stars 4.75k forks source link

Model request: CSPNet #174

Closed ternaus closed 4 years ago

ternaus commented 4 years ago

https://openaccess.thecvf.com/content_CVPRW_2020/papers/w28/Wang_CSPNet_A_New_Backbone_That_Can_Enhance_Learning_Capability_of_CVPRW_2020_paper.pdf

The authors, as usual, claim that their models are faster, lighter, more accurate.

It would be nice to add them to the repo.

rwightman commented 4 years ago

@ternaus what aspect of CSPNet specifically? When I did a quick pass through the paper it looked like it would be most relevant to add support for variants of DenseNet, ResNet, ResNeXT as described in the paper by providing modified blocks for those networks, there were also enhanced variant of PeleeNet described.

ternaus commented 4 years ago

I think adding the support for ResNet and ResNeXt would be great.

bonlime commented 4 years ago

@rwightman I've expected original darknet configs for CSPResNet (using Netron) and it looks like simply changing the block is not enough. It also requires modification of the main stem path.

rwightman commented 4 years ago

I have a few ideas that might allow this to be cleanly integrated, supporting the needed shortcut/route definitions without hacking the base model is a challenge, letting them rattle around while finishing off a few other things.

rwightman commented 4 years ago

Found a bit of time during the weekend rain.

This is my first crack at it, currently on the 'features' branch as that's where the current dev effort is (so this supports feature extraction for obj detection/segmention out of the box features_only=true).

https://github.com/rwightman/pytorch-image-models/blob/features/timm/models/cspnet.py

I'm handling the 1x1 convs between the cross stage downsample and cross path / blocks path differently here. I'm doing one expansion conv, the darknet cfgs have separate convs in each path. The official cfgs also have a more narrow 1x1 in the block path of the first stage which makes for annoying special casing that I've skipped, and also results in a 'partial' shortcut for the first res block (x = x + shortcut), where shortcut would be 1/2 the width of x. Not sure if any of that has any significant impact, the param count is very very close, perhaps FLOPS are a bit higher in mine but that could be offset in PyTorch at least by reducing the number of separate conv layer executions? .... see https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/18

rwightman commented 4 years ago

Oh yeah, and it was possible to fold this into my existing ResNet if I defined separate stage types, but it was adding extra noise to an already busy file (resnet). I kept it as a separate impl that covers the major CSP variants and DarkNet instead.

bonlime commented 4 years ago

@rwightman Have you verified your implementation of CSPDarknet/CSPResNet by trying to actually train it?

rwightman commented 4 years ago

@bonlime yes, 80.06 cspdarknet53 @ 256x256, 80.04 cspresnext50 @ 224x224 (forgot to change res to 256), training a cspresnet50 @ 256x256 right now and will probably leave the rest for now.

bonlime commented 4 years ago

80.04 cspresnext50 @ 224x224 sounds good 80.06 cspdarknet53 @ 256x256 also sounds nice. I guess it's already with all the hacks like antialiasing, smoothing and etc.? Could you share a full config?

I've been looking through implementation and found that DarkNet uses activation in both convs https://github.com/rwightman/pytorch-image-models/blob/08016e839d73a745be5cdac86ee825bd877defff/timm/models/csp.py#L201 while many later networks avoid activation before residual (MobileNet v2 paper even have some justification why they choose linear). I don't know what is your end goal - to be as close to papers as possible or to have the best architectures. if the second - it may be beneficial to remove the activation in last norm.

rwightman commented 4 years ago

@bonlime the csdarknet53.cfg that I was using as a reference from https://github.com/WongKinYiu/CrossStagePartialNetworks/blob/master/cfg/csdarknet53.cfg ... has activations after the 3x3 in the blocks. Perhaps other versions of non CS darknet don't? I probably won't bother to change the block for the non CS darknet.

Hparams are based on some recent experiments, still tweaking. No signiciant model architecture changing tricks like antialiasing, just rand augment, mixup, drop path (stochastic depth), random erasing as the core. Obv producing good results as these are well above the reference impl, and also some recent Mish based experiments I saw.

rwightman commented 4 years ago

merged to master