反向传播不带梯度

guanhdrmq commented 9 months ago

你好，当使用反向传播到cross_modal_image_layers和cross_modal_text_layers的时候为什么会没有梯度。在BERT的modelling_bert 设置梯度的时候显示是没有，请问怎么拿到每一层的梯度，例如cross_modal_image_layers倒数第一层的特征图和梯度。谢谢

GoGoJoestar commented 9 months ago

可以具体说下"在BERT的modelling_bert 设置梯度"是指做了什么操作吗？

guanhdrmq commented 9 months ago

梯度问题已经解决。另外请问怎么获取图片特征输入的长度，假如图片是384384，patch大小是1616，那么patch个数应该是576. 请问怎么获取图片输入特征的长度。谢谢

GoGoJoestar commented 9 months ago

假设图片经过vision model (ViT) 编码后的维度是[batch_size, vision_length, hidden_size]，其中第二维vision_length表示图片特征的长度，其由图片整体特征拼上每个patch的特征组成。因此length = 1 + patch数，以图片尺寸384*384、patch大小16*16为例，patch数量为(384 / 16) ^ 2 = 576，vision_length = 1 + 576 = 577

guanhdrmq commented 9 months ago

好滴，请问怎么获取到image cross attention 和 text cross attention的qk值？因为底层调用都是huggingface的BERT的modelling_bert代码，没有重构VLE代码融合部分。谢谢。

GoGoJoestar commented 9 months ago

我们没有对cross attention内部做修改，如果要获取其中的query和key，可以考虑在models/VLE/modeling_vle.py中重写huggingface的BertAttention等相关代码

guanhdrmq commented 8 months ago

谢谢，还有一个问题，请问怎么拿到视觉的最后一层的特征，也就是hidden_states

GoGoJoestar commented 8 months ago

VLEModel的输出中包含了最后的视觉特征，可以参照下面的代码

model = VLEModel.from_pretrained(model_name)
model_outputs = model(inputs)

# 最后的图像表示
model_outputs.image_embeds

# 最后的文本表示
model_outputs.text_embeds

guanhdrmq commented 8 months ago

好的，谢谢。还有一个问题，请问VLE模型可以在huggingface的框架上可以使用两个2080ti把模型分配在2个GPU上去吗？或者共享内存设置，例如device_map或者共享CPU内存，目前我们还没有成功，但是在一个3060可以运行或者一个4080可以运行也是勉强。谢谢

GoGoJoestar commented 8 months ago

没有这样做过，可以试试在device_map中手动指定模型的各模块分配到哪张卡上。使用device_map可能会和分布式训练冲突

device_map={
 "vision_model": 0,
 "text_model": 0,
 "text_projection_layer": 1,
 "image_projection_layer": 1,
 "token_type_embeddings": 1,
 "cross_modal_image_layers": 1,
 "cross_modal_text_layers": 1,
 "cross_modal_image_pooler": 1,
 "cross_modal_text_pooler": 1
}

guanhdrmq commented 8 months ago

您好，这是我们的代码，只能注释model.to(device，无法加载到2个2080ti上。只是在CPU上可以运行。我们只是推理不训练。请问要怎么解决好呢？谢谢 import os os.environ["CUDA_VISIBLE_DEVICES"] = "0, 1"

import torch device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

if name == "main": from VLE import VLEForVQA, VLEProcessor, VLEForVQAPipeline

device_map = {
    "vision_model": 0,
    "text_model": 0,
    "text_projection_layer": 1,
    "image_projection_layer": 1,
    "token_type_embeddings": 1,
    "cross_modal_image_layers": 1,
    "cross_modal_text_layers": 1,
    "cross_modal_image_pooler": 1,
    "cross_modal_text_pooler": 1
}

model = VLEForVQA.from_pretrained("./pretrained/vle-base-for-vqa")

vle_processor = VLEProcessor.from_pretrained(
    "./pretrained/vle-base-for-vqa",
    num_labels=len(config.id2label),
    id2label=config.id2label,
    label2id=config.label2id,
    output_hidden_states=True
)
vqa_pipeline = VLEForVQAPipeline(model=model, device_map=device_map, vle_processor=vle_processor)
# vqa_pipeline = VLEForVQAPipeline(model=model, device=device, vle_processor=vle_processor)
# model.to(device)
model.eval()

dataset = VQADataset(questions=questions[:5000], annotations=annotations[:5000],)
test_dataloader = DataLoader(dataset, batch_size=1, shuffle=False)

# 计数器
correct = 0.0
total = 0

for image, text, labels in test_dataloader:
    image = image.squeeze(0)
    # image = image.to(device)
    image = tensor_to_pil(image)
    inputs = {"image": image, "question": text[0]}
    vqa_answers = vqa_pipeline(**inputs, top_k=5)

    _, _, logits, answer_list = vqa_answers
    top_answer = answer_list[0]['answer']
    print("prediction answer:", top_answer)

    true_label_index = torch.argmax(labels)
    true_label = config_idandlabel["answer_candidates"][true_label_index]
    if top_answer == true_label:
        correct = correct + 1
    total = total + 1
    print("total==================", total)
    if total % 100 == 0:
        print("total:{}".format(total))

acc = correct / total
print("acc:{:.4f}".format(acc))

GoGoJoestar commented 8 months ago

使用VLEForVQA模型的话，device_map里的模块名要调整下

device_map = {
    "vle.vision_model": 0,
    "vle.text_model": 0,
    "vle.text_projection_layer": 1,
    "vle.image_projection_layer": 1,
    "vle.token_type_embeddings": 1,
    "vle.cross_modal_image_layers": 1,
    "vle.cross_modal_text_layers": 1,
    "vle.cross_modal_image_pooler": 1,
    "vle.cross_modal_text_pooler": 1,
    "vqa_classifier": 1,
}

在加载model时传入device_map参数

model = VLEForVQA.from_pretrained("./pretrained/vle-base-for-vqa",device_map=device_map)

这时模型已经分配到两张卡上了。

关于Pipeline，似乎不支持多卡（device_map），只支持传入device。比如下面这行代码，传入device=0

vqa_pipeline = VLEForVQAPipeline(model=model, device=0, vle_processor=vle_processor)

这边又会把0,1卡上的模型全放到0卡上。多卡建议不使用Pipeline，可以参照VLEForVQAPipeline中的处理逻辑在你的代码里重写一下流程。

guanhdrmq commented 8 months ago

您好，使用model = VLEForVQA.from_pretrained("./pretrained/vle-base-for-vqa",device_map=device_map)确实用到了2个2080ti，在batch_size=1时，只跑了4个数据，显存就爆满，报出out of memory的错误，请问是不是当前的设备的原因，导致无法继续运行？如果不是的话，请问还有其他的解决方案吗？谢谢

GoGoJoestar commented 8 months ago

可以试试下面几个方法，能否降低显存使用

调整device_map的分配，让显存更均匀
使用torch.no_grad
减小model和processor的image size。具体修改模型config.json的image_size和preprocessor_config.json的crop_size和size

guanhdrmq commented 8 months ago

你好，1方案昨天已经尝试了，但是没有成功，请问可否再给个device_map字典。2 我们是需要梯度的，所以Torch.no_grad应该不会采用，已经修改了bert底层，打开记录梯度，3 已经尝试了但是模型输入图片必须得是576576，改成384384报错，请问您那边可以尝试一下resize吗？谢谢

GoGoJoestar commented 8 months ago

device_map可以根据在两张卡上的实际显存占用，调整放0号卡和1号卡的模块，比如把vle.vision_model也设为1（可能需要相应修改图像输入的device）。我们没有2080ti，没法给出更具体的设置了。
需要梯度的话显存占用确实会增加很多。是需要全部梯度吗？不需要梯度的部分模块有设置requires_grad=False吗
在模型目录下的config.json和preprocessor_config.json中修改size后是可以运行的，具体是报什么错?

guanhdrmq commented 8 months ago

如果图片改成384*384大小，造成维度不匹配了 Traceback (most recent call last): File "D:\WorkSpace\workspace\multimodal_robustness\vle_vqav2_image.py", line 171, in model = VLEForVQA.from_pretrained("./pretrained/vle-base-for-vqa") File "C:\Users\Admin.conda\envs\Base\lib\site-packages\transformers\modeling_utils.py", line 3307, in from_pretrained ) = cls._load_pretrained_model( File "C:\Users\Admin.conda\envs\Base\lib\site-packages\transformers\modeling_utils.py", line 3756, in _load_pretrained_model raise RuntimeError(f"Error(s) in loading state_dict for {model.class.name}:\n\t{error_msg}") RuntimeError: Error(s) in loading state_dict for VLEForVQA: size mismatch for vle.vision_model.vision_model.embeddings.position_embedding.weight: copying a param with shape torch.Size([1297, 768]) from checkpoint, the shape in current model is torch.Size([577, 768]). You may consider adding ignore_mismatched_sizes=True in the model from_pretrained method.

GoGoJoestar commented 8 months ago

试下在加载model后调整position_embedding的权重，使用models/VLE/modeling_vle.py中的extend_position_embedding方法。具体方式参考下面的代码或者examples/VQA/vqav2_train_module.py中的line: 68~76

patch_size = model.config.vision_config.patch_size
position_length_after = (model.config.vision_config.image_size//model.config.vision_config.patch_size)**2 + 1
position_embed_dim = model.vle.vision_model.vision_model.embeddings.position_embedding.embedding_dim

new_state_dict = extend_position_embedding(model.state_dict(), patch_size, model.config.vision_config.image_size)
model.vle.vision_model.vision_model.embeddings.position_embedding = nn.Embedding(position_length_after, position_embed_dim, device=model.vle.vision_model.vision_model.embeddings.position_embedding.weight.device)
model.vle.vision_model.vision_model.embeddings.register_buffer("position_ids", torch.arange(position_length_after, device=model.vle.vision_model.vision_model.embeddings.position_ids.device).expand((1, -1)))
model.load_state_dict(new_state_dict)

guanhdrmq commented 8 months ago

还有2个问题 1源代码用的是DeBERTa-v2 in huggingface. 2请问在A100上可以做多显卡推理吗？谢谢

GoGoJoestar commented 8 months ago

没看懂你的问题
多卡推理可以在A100上进行

iflytek / VLE

反向传播不带梯度 #8