raoyongming / DenseCLIP

[CVPR 2022] DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
505 stars 38 forks source link

Some questions of ViT-B-DenseCLIP #6

Closed lixiangMindSpore closed 2 years ago

lixiangMindSpore commented 2 years ago

image 1 I intend to know the performance of ViT-B-DenseCLIP (VS RN101-DenseCLIP), can you tell me the specific information of it? and how to train ViT-B-DenseCLIP on coco or ADE20K? 2 ViT-B-DenseCLIP is based on ViT-B-16.pt? not ViT-B-32.pt?

raoyongming commented 2 years ago

Hi, thanks for your interest in our work.

Directly applying the ViT-B model to the detection task is difficult. Since the complexity of self-attention is O(H^2W^2), the large input image (e.g., 800x1200) in the detection problem will lead to considerable GPU memory consumption. Therefore, we only tested the ViT-B-DenseCLIP model on the semantic segmentation task on ADE20k. The training config file and results are provided in the Segmentation section in the README.

lixiangMindSpore commented 2 years ago

专家您好,请教几个问题: 1 README里没有ViT-B-DenseCLIP如何进行训练的信息,只有表格里有ViT-B-DenseCLIP字样,ViT-B-DenseCLIP如何进行训练? 2 ViT-B-DenseCLIP模型我在CLIP提供的示例程序中出现如图1所示错误,CLIP的示例代码如图2所示 image 图1 image 图2

lixiangMindSpore commented 2 years ago

Hi, thanks for your interest in our work.

Directly applying the ViT-B model to the detection task is difficult. Since the complexity of self-attention is O(H^2W^2), the large input image (e.g., 800x1200) in the detection problem will lead to considerable GPU memory consumption. Therefore, we only tested the ViT-B-DenseCLIP model on the semantic segmentation task on ADE20k. The training config file and results are provided in the Segmentation section in the README.

专家您好,请教几个问题: 1 README里有RN50-CLIP和RN101-CLIP训练方法,但是没有ViT-B-DenseCLIP如何进行训练的信息,只有表格里有ViT-B-DenseCLIP字样,ViT-B-DenseCLIP如何进行训练? 2 ViT-B-DenseCLIP模型我在CLIP提供的示例程序中出现如图1所示错误,CLIP的示例代码如图2所示,我用ViT-B-32.pt替换代码中的第六行路径就没有问题 image 图1 image 图2

raoyongming commented 2 years ago
  1. 我们提供了各种setting的config文件,训练模型只需要运行bash dist_train.sh configs/<config>.py 8, 其中<config>.py就是表格中的config的名字。
  2. 我们的模型是用于检测和分割任务的,包含backbone,text encoder 和 decoder等多个部分,不能直接用CLIP的例子使用。
lixiangMindSpore commented 2 years ago
  1. 我们提供了各种setting的config文件,训练模型只需要运行bash dist_train.sh configs/<config>.py 8, 其中<config>.py就是表格中的config的名字。
  2. 我们的模型是用于检测和分割任务的,包含backbone,text encoder 是 decoder等多个部分,不能直接用CLIP的例子使用。

你们的ViT-B-DenseCLIP如何使用?有没有相关的简单demo?就像CLIP提供的示例程序那样

raoyongming commented 2 years ago

我们的代码是基于mmseg写的,mmseg本身提供了很多测试和可视化的工具。比如测试我们的模型可以用bash dist_test.sh configs/<config>.py /path/to/checkpoint 8 --eval mIoU --aug-test这个命令;修改参数也能做可视化,具体你可以参考mmseg的文档.