htyao89 / Textual-based_Class-aware_prompt_tuning

MIT License
19 stars 1 forks source link

Question about TextEncoder in TCP #2

Open 766O opened 4 months ago

766O commented 4 months ago
class TextEncoder(nn.Module):
    def __init__(self, clip_model,device):
        super().__init__()
        self.transformer = clip_model.transformer
        self.positional_embedding = clip_model.positional_embedding
        self.ln_final = clip_model.ln_final
        self.text_projection = clip_model.text_projection
        self.dtype = clip_model.dtype

    def forward(self, prompts, class_feature, weight, tokenized_prompts,flag=False):
        x = prompts + self.positional_embedding.type(self.dtype)
        x = x.permute(1, 0, 2)  # NLD -> LND
        if flag:
            x = self.transformer(x)
        else:
            counter=0
            outputs = self.transformer.resblocks([x,class_feature,weight,counter])
            x = outputs[0]

        x = x.permute(1, 0, 2)  # LND -> NLD
        x = self.ln_final(x).type(self.dtype)
        x = x[torch.arange(x.shape[0]), tokenized_prompts.argmax(dim=-1)] @ self.text_projection
        return x

I enjoyed reading this great paper. However i tried this method in other dataset i met the error.

The class_feature is being fed into the text encoder based on the value of the flag variable. However, since flag is set to False, the code in the else block is executed directly. Additionally, the code states that class_feature is inserted starting from the lth layer, but the actual layer index is not specified in the code. Furthermore, the code in the else block raises an error because the input type is a list, not a tensor. *** AttributeError: 'list' object has no attribute 'dtype'

Can you tell me the cause and solution for this?

htyao89 commented 4 months ago
class TextEncoder(nn.Module):
    def __init__(self, clip_model,device):
        super().__init__()
        self.transformer = clip_model.transformer
        self.positional_embedding = clip_model.positional_embedding
        self.ln_final = clip_model.ln_final
        self.text_projection = clip_model.text_projection
        self.dtype = clip_model.dtype

    def forward(self, prompts, class_feature, weight, tokenized_prompts,flag=False):
        x = prompts + self.positional_embedding.type(self.dtype)
        x = x.permute(1, 0, 2)  # NLD -> LND
        if flag:
            x = self.transformer(x)
        else:
            counter=0
            outputs = self.transformer.resblocks([x,class_feature,weight,counter])
            x = outputs[0]

        x = x.permute(1, 0, 2)  # LND -> NLD
        x = self.ln_final(x).type(self.dtype)
        x = x[torch.arange(x.shape[0]), tokenized_prompts.argmax(dim=-1)] @ self.text_projection
        return x

I enjoyed reading this great paper. However i tried this method in other dataset i met the error.

The class_feature is being fed into the text encoder based on the value of the flag variable. However, since flag is set to False, the code in the else block is executed directly. Additionally, the code states that class_feature is inserted starting from the lth layer, but the actual layer index is not specified in the code. Furthermore, the code in the else block raises an error because the input type is a list, not a tensor. *** AttributeError: 'list' object has no attribute 'dtype'

Can you tell me the cause and solution for this?

The detailed code please see:./trainers/clip_text/model.py

image

Moreover, for the standard transformer, the input is a tensor. And for the proposed method, the input is a list with the length of 4 "[x,class_feature,weight,counter]"(L263). can you please the log of the error "because the input type is a list, not a tensor. *** AttributeError: 'list' object has no attribute 'dtype'"

We will revise the code after the DDL of NeurIPS24 by taking the "insert layers" as a parameter.

766O commented 4 months ago

Thank you very much for the quick and kind response. The answer was very helpful in solving the problem!