keras-team / keras-cv

Industry-strength Computer Vision workflows with Keras
Other
1.01k stars 331 forks source link

Add Video Swin Transformer #2369

Closed innat closed 8 months ago

innat commented 9 months ago

What does this PR do?

Fixes https://github.com/keras-team/keras-cv/issues/2262

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed.

innat commented 9 months ago

@divyashreepathihalli @tirthasheshpatel This PR is ready to take some initial feedback.

I have some query.

tirthasheshpatel commented 8 months ago

Thanks for the PR @innat! This is really great work!

  • What does kaggle_handle mean? (A weight file or folder! If so, how this config contains weight path!). How can I test the model with local path? Setting "kaggle_handle": /usr/video_swin_weights" didn't work for me.

kaggle_handle is the link to the Kaggle model card. It contains the weights in .weights.h5 format and the config for the pretrained model as config.json. For example, the link kaggle://keras/efficientnetv2/keras/efficientnetv2_s/2 points to this Kaggle model card.

To test the model locally, I usually just initialize the pretrained model and call load_weights to load the pretrained weights. You can do this in tests to make sure they pass. Once you have the weights ready, feel free to ping me, @divyashreepathihalli, or @mattdangerw and we can get the weights uploaded to Kaggle.

  • How should I hand over the weight files, (which were ported from official release)? Should it be .weight.h5 or .keras format?

It should be a .weights.h5 file and can be generated using the .save_weights method.

innat commented 8 months ago
innat commented 8 months ago

Once you have the weights ready, feel free to ping me, @divyashreepathihalli, or @mattdangerw and we can get the weights uploaded to Kaggle.

@mattdangerw Here are the weights files, for three variation of swin model.

Benchmark dataset: Kinetrics 400

Benchmark dataset: Kinetrics 400 - Pretrained: ImageNet 22K

Benchmark dataset: Kinetrics 600

Benchmark dataset: Something Something V2

To compare with official torch model, check this gist.

innat commented 8 months ago

@keras-team A gentle reminder for this PR. I would like to complete it within this month (if possible) after addressing the reviews. Otherwise please let me know if there are some other high priority, so I will close this PR for the time being. Thanks.

divyashreepathihalli commented 8 months ago

LGTM! Thanks for the PR @innat, can you also please provide an example use code snippet for Kaggle model page?

Also if you want to test GPU tests locally, you can run pytest <file name> --run_large

innat commented 8 months ago

@divyashreepathihalli Thanks.

I've written plenty of eample how to use this model with different backend, link. For kaggle model page, could you please specify what and how much demonstration is needed?

I've tested with GPU. It seems all are set.

!pytest .../keras_cv/layers/video_swin_layers_test.py --run_large
============================= test session starts ==============================
platform linux -- Python 3.10.13, pytest-8.0.1, pluggy-1.3.0
rootdir: /kaggle/working/keras-cv
configfile: setup.cfg
plugins: anyio-4.2.0, typeguard-4.1.5
collected 9 items                                                              

keras_cv/layers/video_swin_layers_test.py s..s.....                      [100%]

========================= 7 passed, 2 skipped in 1.52s =========================

!pytest ...//keras-cv/keras_cv/models/backbones/video_swin/ --run_large
============================= test session starts ==============================
platform linux -- Python 3.10.13, pytest-8.0.1, pluggy-1.3.0
rootdir: /kaggle/working/keras-cv
configfile: setup.cfg
plugins: anyio-4.2.0, typeguard-4.1.5
collected 16 items                                                             

keras_cv/models/backbones/video_swin/video_swin_backbone_presets_test.py s [  6%]
.sss..ss                                                                 [ 56%]
keras_cv/models/backbones/video_swin/video_swin_backbone_test.py s.sssss [100%]

======================== 4 passed, 12 skipped in 6.30s =========================

!pytest .../keras-cv/keras_cv/models/classification/video_classifier_test.py --run_large
============================= test session starts ==============================
platform linux -- Python 3.10.13, pytest-8.0.1, pluggy-1.3.0
rootdir: /kaggle/working/keras-cv
configfile: setup.cfg
plugins: anyio-4.2.0, typeguard-4.1.5
collected 8 items                                                              

keras_cv/models/classification/video_classifier_test.py s.......         [100%]

=================== 7 passed, 1 skipped in 221.61s (0:03:41) ===================
add Codeadd Markdown
innat commented 8 months ago

@divyashreepathihalli Thanks for checking. Sorry for the inconveniences. I only checked with the tensorflow backend. I break down the key error. Let me know if I've missed anyting.

The jax issue is failing for this, I think it's kinda bug in jax backend.

innat commented 8 months ago

About weight, please check this.

weight identifier

a. kinetics400-tiny
b. kinetics400-samll
c. kinetics400-base
d. kinetics400-base-imgnet22k
f. kinetics600-base-imgnet22k
g. something-something-v2-base

Each has its backbone and classifier part.

comparision.

I showed comparision between kcv and torchvision, gist. But note, torchvision offeres only a, b, c, d in the above weight identifier. So, in that gist, only these are shown.

innat commented 8 months ago

I think test are all passed. But eventually failling for this https://github.com/keras-team/keras-cv/pull/2401

innat commented 8 months ago

Summarizing weight check.

Backbones (tolerance 1e-4)

Classifier (tolerance 1e-5)

notebook-1 for kinetics-400 (tiny, small, base, base-imagenet22k) notebook-2 for kinetics-600 (base-imagenet22k), something-something-v2

@tirthasheshpatel @divyashreepathihalli Could you please verify the weight used in the above notebooks? I will remove this notebooks from kaggle workspace afterward.

Note, In notebook-1, torchvision lib is used to load video-swin api and the pytorch weights they offered, whereas in notebook-2, raw official code and weights are loaded.

innat commented 8 months ago

ONNX

I noticed others also tried to export this model to onnx format but failed and reported to the official repo, tickets. So, I tried with this implementation with torch backend and it works as expected.

model = VideoClassifier(
    backbone=backbone,
    num_classes=num_classes,
    activation=None,
    pooling='avg',
)
model.eval()
batch_size = 1

#Input to the model
x = torch.randn(batch_size, 32, 224, 224, 3, requires_grad=True)
torch_out = model(x)

Using the torch official guideline.

torch.onnx.export(
    model, # model being run
    x,  # model input (or a tuple for multiple inputs)
    "vswin.onnx", 
    export_params=True,       
    opset_version=10,       
    do_constant_folding=True, 
    input_names = ['input'],   # the model's input names
    output_names = ['output'], # the model's output names
    dynamic_axes={
        'input' : {0 : 'batch_size'}, 
        'output' : {0 : 'batch_size'}
    }
)
import onnx
import onnxruntime

def to_numpy(tensor):
    if tensor.requires_grad:
        tensor = tensor.detach()
    tensor = tensor.cpu()
    numpy_array = tensor.numpy()
    return numpy_array

onnx_model = onnx.load("vswin.onnx")
onnx.checker.check_model(onnx_model)

ort_session = onnxruntime.InferenceSession(
    "vswin.onnx", providers=["CPUExecutionProvider"]
)

# compute ONNX Runtime output prediction
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(x)}
ort_outs = ort_session.run(None, ort_inputs)

Logit checking.

np.testing.assert_allclose(
    to_numpy(torch_out), ort_outs[0], rtol=1e-03, atol=1e-05
)
innat commented 8 months ago

lets move the video_swin layers into the model folder itself. Everything else LGTM!

Sorry, could u please elaborate? Do u want this file to relocate to here? If so, wouldn't it be anti pattern from current standard? I mean, all of the layers supposed to be in this directory, or no?

divyashreepathihalli commented 8 months ago

lets move the video_swin layers into the model folder itself. Everything else LGTM!

Sorry, could u please elaborate? Do u want this file to relocate to here? If so, wouldn't it be anti pattern from current standard? I mean, all of the layers supposed to be in this directory, or no?

Nope! all model specific layers should be inside the model folder. Only generic layers will go under the layers folder. The move locations linked are correct.

innat commented 8 months ago

I think the test is failling for other issue.

divyashreepathihalli commented 8 months ago

Thank you for this awesome contribution!!!