structure of ViViT-b - Githubissues

mx-mark / VideoTransformer-pytorch

PyTorch implementation of a collections of scalable Video Transformer Benchmarks.

272 stars 34 forks source link

Open nullhty opened 1 year ago

nullhty commented 1 year ago

What is the structure of model ViViT-b you published? I can't read it with the default parameters

mx-mark commented 1 year ago

@nullhty There are two parts of model structure, the first one is a spatial-only transformer and the last one is a temporal-only transformer.