lda_0.1.pt等文件 - Githubissues

Liu-arch commented 10 months ago

您好，我对您的工作十分感兴趣，并且有两个问题想询问您。 1.您如何获取到Classifier（训练过程）即：如何通过Transferring visual statistic knowledge(LDA)得到lda_0.1.pt文件，以及如何通过Transferring textual semantic knowledge得到classes_features的训练过程 2.相关pt文件distilbert-base-k400.pt和lda_0.1.pt没有给出。十分期待您的回信

whwu95 commented 10 months ago

Transferring visual statistic knowledge: 对于 Kinetics-400 数据集的实验，我们从每个类别中采样了 60 个视频，大约占训练数据的 10%。这些视频都直接送给CLIP的visual encoder来得到video embeddings。利用这些embeddings和其对应的label，我们可以用LDA得到LDA coefficient，再用其作为classifier.

Transferring textual semantic knowledge: 用BERT直接对category names抽取text embeddings，并作为classifier.

Liu-arch commented 10 months ago

十分感谢您的回复！关于您BIKE那篇论文，我也有一点问题，就是你说Don't change the num_sample, just set it to 1。但是关于views在UCF和HMDB上是11，但是在其他数据集，包括Kinetics400，上面写的是43，这种当num_sample改变之后train的代码如何修改呢？因为数据从一个tensor变为了一个列表。images = images.view((-1,config.data.num_segments,3)+images.size()[-2:]) # bt 3 h w b,t,c,h,w = images.size() images= images.view(-1,c,h,w) 以及 image_embedding, cls_embedding, text_embedding, logit_scale = model(images, texts, return_token=True)

------------------ 原始邮件 ------------------ 发件人: "whwu95/Text4Vis" @.>; 发送时间: 2023年10月30日(星期一) 晚上8:07 @.>; @.**@.>; 主题: Re: [whwu95/Text4Vis] lda_0.1.pt等文件 (Issue #17)

Transferring visual statistic knowledge: 对于 Kinetics-400 数据集的实验，我们从每个类别中采样了 60 个视频，大约占训练数据的 10%。这些视频都直接送给CLIP的visual encoder来得到video embeddings。利用这些embeddings和其对应的label，我们可以用LDA得到LDA coefficient，再用其作为classifier.

Transferring textual semantic knowledge: 用BERT直接对category names抽取text embeddings，并作为classifier.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

whwu95 commented 10 months ago

这里的4x3 view仅存在于test阶段，在运行指令后加入相关后缀即可，如--test_crops 3 --test_clips 4，和config中的num_sample无关啦。

sh scripts/run_test.sh configs/k400/k400_train_rgb_vitb-32-f8.yaml exp/k400/ViT-B/32/f8/last_model.pt --test_crops 3 --test_clips 4

Liu-arch commented 9 months ago

您好十分感谢您之前的回信！我还有几个问题困惑着我，希望您能给出解答。

在BIKE这篇论文中，exps/k400/VIT-B/32/8f/video_attributes_log.txt中为什么显示的训练过程只有30代并且显示测试准确率只有70.05.但是下面写着video_labels== torch.Size([19800]) sentence_label=== torch.Size([19800]) tensor(19800, device='cuda:0') a==0.7 b==0.3 top1==81.46969604492188 top5==95.8131332397461 大概训练多少代才得到的81.46呢？前面a==0.7,b==0.3都是什么意思？
Frozen label encoder 以及category encoder有什么区别和联系吗？table6(a)中显示 frozen label encoder提升了效果。以及(technical) Transf是什么呢？
除此之外还有一个小问题就是关于baseline就是普通的CLIP是指单纯的这样吗？ similarity = (vid_emb @ cls_emb.T).softmax(dim=-1)#torch.Size([32, 8, 32]) 第一个32是video的第二个是text的很正确 similarity = similarity.mean(dim = 1) logit = similarity

谢谢您的回答。

------------------ 原始邮件 ------------------ 发件人: "whwu95/Text4Vis" @.>; 发送时间: 2023年10月30日(星期一) 晚上10:18 @.>; @.**@.>; 主题: Re: [whwu95/Text4Vis] lda_0.1.pt等文件 (Issue #17)

这里的4x3 view仅存在于test阶段，在运行指令后加入相关后缀即可，如--test_crops 3 --test_clips 4，和config中的num_sample无关啦。

sh scripts/run_test.sh configs/k400/k400_train_rgb_vitb-32-f8.yaml exp/k400/ViT-B/32/f8/last_model.pt --test_crops 3 --test_clips 4

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Liu-arch commented 9 months ago

您好十分感谢您之前的回信！我还有几个问题困惑着我，希望您能给出解答。

在BIKE这篇论文中，exps/k400/VIT-B/32/8f/video_attributes_log.txt中为什么显示的训练过程只有30代并且显示测试准确率只有70.05.但是下面写着video_labels== torch.Size([19800]) sentence_label=== torch.Size([19800]) tensor(19800, device='cuda:0') a==0.7 b==0.3 top1==81.46969604492188 top5==95.8131332397461 大概训练多少代才得到的81.46呢？前面a==0.7,b==0.3都是什么意思？
Frozen label encoder 以及category encoder有什么区别和联系吗？table6(a)中显示 frozen label encoder提升了效果。以及(technical) Transf是什么呢？
除此之外还有一个小问题就是关于baseline就是普通的CLIP是指单纯的这样吗？ similarity = (vid_emb @ cls_emb.T).softmax(dim=-1)#torch.Size([32, 8, 32]) 第一个32是video的第二个是text的很正确 similarity = similarity.mean(dim = 1) logit = similarity

为什么我达不到您所说的VITB32_8中所指的76.8.这大概要训练多少代？谢谢您的回答。

------------------ 原始邮件 ------------------ 发件人: "whwu95/Text4Vis" @.>; 发送时间: 2023年10月30日(星期一) 晚上10:18 @.>; @.**@.>; 主题: Re: [whwu95/Text4Vis] lda_0.1.pt等文件 (Issue #17)

这里的4x3 view仅存在于test阶段，在运行指令后加入相关后缀即可，如--test_crops 3 --test_clips 4，和config中的num_sample无关啦。

sh scripts/run_test.sh configs/k400/k400_train_rgb_vitb-32-f8.yaml exp/k400/ViT-B/32/f8/last_model.pt --test_crops 3 --test_clips 4

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Liu-arch commented 9 months ago

您好十分感谢您之前的回信！我还有几个问题困惑着我，希望您能给出解答。

在BIKE这篇论文中，exps/k400/VIT-B/32/8f/video_attributes_log.txt中为什么显示的训练过程只有30代并且显示测试准确率只有70.05.但是下面写着video_labels== torch.Size([19800]) sentence_label=== torch.Size([19800]) tensor(19800, device='cuda:0') a==0.7 b==0.3 top1==81.46969604492188 top5==95.8131332397461 大概训练多少代才能得到加上attributes部分之前的78.9呢？
Frozen label encoder 以及category encoder有什么区别和联系吗？table6(a)中显示 frozen label encoder提升了效果。以及(technical) Transf是什么呢？
除此之外还有一个小问题就是关于baseline就是普通的CLIP是指单纯的这样吗？ similarity = (vid_emb @ cls_emb.T).softmax(dim=-1)#torch.Size([32, 8, 32]) 第一个32是video的第二个是text的很正确 similarity = similarity.mean(dim = 1) logit = similarity

为什么我达不到您所说的VITB32_8中所指的76.8.这大概要训练多少代？谢谢您的回答。

------------------ 原始邮件 ------------------ 发件人: "whwu95/Text4Vis" @.>; 发送时间: 2023年10月30日(星期一) 晚上10:18 @.>; @.**@.>; 主题: Re: [whwu95/Text4Vis] lda_0.1.pt等文件 (Issue #17)

这里的4x3 view仅存在于test阶段，在运行指令后加入相关后缀即可，如--test_crops 3 --test_clips 4，和config中的num_sample无关啦。

sh scripts/run_test.sh configs/k400/k400_train_rgb_vitb-32-f8.yaml exp/k400/ViT-B/32/f8/last_model.pt --test_crops 3 --test_clips 4

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

whwu95 / Text4Vis

lda_0.1.pt等文件 #17