simon-ging / coot-videotext

COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Apache License 2.0
288 stars 55 forks source link

How can I extract the COOT features for my chinese caption datasets? #40

Open will-wiki opened 3 years ago

will-wiki commented 3 years ago

thanks advance!

will-wiki commented 3 years ago

how can i get the pretrained model like "provided_models/yc2_100m_coot.pth" for my chinese caption datasets? I have extract the coot features of youcook2 with "Extract your own embeddings" in Readme and get the same result of paper,but when I extract the youcook2 coot features without "provided_models/yc2_100m_coot.pth", the caption result worse than the original MART model。

The retriev result: with "provided_models/yc2_100m_coot.pth": INFO Saved embeddings to experiments/retrieval/paper2020/yc2_100m_coot_valset1/embeddings/embeddings_0.h5 INFO Retriev | R@1 | R@5 | R@10 | R@50 | MeanR | MedR | Sum INFO vid | 0.810 | 0.958 | 0.978 | 0.996 | 1.0 | 2.2 | 2.764 INFO par | 0.783 | 0.963 | 0.978 | 0.996 | 1.0 | 2.3 | 2.742 INFO cli | 0.159 | 0.395 | 0.513 | 0.782 | 10.0 | 74.4 | 1.335 INFO sen | 0.169 | 0.406 | 0.525 | 0.780 | 9.0 | 73.2 | 1.355 INFO Loss 0.04828 (Contr: 0.03885, CC: 0.00943) Retrieval: vidpar (457) in 0.042s, clisen (3492) in 2.100s, total 6.557s, forward 0.220s

without "provided_models/yc2_100m_coot.pth": INFO Saved embeddings to experiments/retrieval/paper2020/yc2_100m_coot_valset1/embeddings/embeddings_0.h5 INFO Retriev | R@1 | R@5 | R@10 | R@50 | MeanR | MedR | Sum INFO vid | 0.042 | 0.149 | 0.225 | 0.584 | 39.0 | 57.1 | 0.775 INFO par | 0.035 | 0.179 | 0.291 | 0.759 | 22.0 | 37.0 | 0.974 INFO cli | 0.000 | 0.001 | 0.003 | 0.017 | 1391.0 | 1488.7 | 0.018 INFO sen | 0.000 | 0.001 | 0.003 | 0.015 | 1425.0 | 1538.9 | 0.017

simon-ging commented 3 years ago

The pretrained models only understand english, for chinese you will have to train everything from scratch: Train retrieval, extract features, train captioning.

will-wiki commented 3 years ago

The pretrained models only understand english, for chinese you will have to train everything from scratch: Train retrieval, extract features, train captioning.

Thank you very much for your reply! What you mean is that the "provided_models/yc2_100m_coot.pth" is generated by train retrieval

simon-ging commented 3 years ago

Yes, that is correct