Some questions of ViT-B-DenseCLIP

raoyongming / DenseCLIP

[CVPR 2022] DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

505 stars 38 forks source link

Some questions of ViT-B-DenseCLIP #6

Closed lixiangMindSpore closed 2 years ago

lixiangMindSpore commented 2 years ago

1 I intend to know the performance of ViT-B-DenseCLIP (VS RN101-DenseCLIP), can you tell me the specific information of it? and how to train ViT-B-DenseCLIP on coco or ADE20K? 2 ViT-B-DenseCLIP is based on ViT-B-16.pt? not ViT-B-32.pt?

raoyongming commented 2 years ago

Hi, thanks for your interest in our work.

Directly applying the ViT-B model to the detection task is difficult. Since the complexity of self-attention is O(H^2W^2), the large input image (e.g., 800x1200) in the detection problem will lead to considerable GPU memory consumption. Therefore, we only tested the ViT-B-DenseCLIP model on the semantic segmentation task on ADE20k. The training config file and results are provided in the Segmentation section in the README.

lixiangMindSpore commented 2 years ago

专家您好，请教几个问题： 1 README里没有ViT-B-DenseCLIP如何进行训练的信息，只有表格里有ViT-B-DenseCLIP字样，ViT-B-DenseCLIP如何进行训练？ 2 ViT-B-DenseCLIP模型我在CLIP提供的示例程序中出现如图1所示错误，CLIP的示例代码如图2所示图1 图2

lixiangMindSpore commented 2 years ago

Hi, thanks for your interest in our work.

Directly applying the ViT-B model to the detection task is difficult. Since the complexity of self-attention is O(H^2W^2), the large input image (e.g., 800x1200) in the detection problem will lead to considerable GPU memory consumption. Therefore, we only tested the ViT-B-DenseCLIP model on the semantic segmentation task on ADE20k. The training config file and results are provided in the Segmentation section in the README.

专家您好，请教几个问题： 1 README里有RN50-CLIP和RN101-CLIP训练方法，但是没有ViT-B-DenseCLIP如何进行训练的信息，只有表格里有ViT-B-DenseCLIP字样，ViT-B-DenseCLIP如何进行训练？ 2 ViT-B-DenseCLIP模型我在CLIP提供的示例程序中出现如图1所示错误，CLIP的示例代码如图2所示，我用ViT-B-32.pt替换代码中的第六行路径就没有问题图1 图2

raoyongming commented 2 years ago

我们提供了各种setting的config文件，训练模型只需要运行bash dist_train.sh configs/<config>.py 8, 其中<config>.py就是表格中的config的名字。
我们的模型是用于检测和分割任务的，包含backbone，text encoder 和 decoder等多个部分，不能直接用CLIP的例子使用。

lixiangMindSpore commented 2 years ago

我们提供了各种setting的config文件，训练模型只需要运行bash dist_train.sh configs/<config>.py 8, 其中<config>.py就是表格中的config的名字。

我们的模型是用于检测和分割任务的，包含backbone，text encoder 是 decoder等多个部分，不能直接用CLIP的例子使用。

你们的ViT-B-DenseCLIP如何使用？有没有相关的简单demo？就像CLIP提供的示例程序那样

raoyongming commented 2 years ago

我们的代码是基于mmseg写的，mmseg本身提供了很多测试和可视化的工具。比如测试我们的模型可以用bash dist_test.sh configs/<config>.py /path/to/checkpoint 8 --eval mIoU --aug-test这个命令；修改参数也能做可视化，具体你可以参考mmseg的文档.