luogen1996 / LLaVA-HR

LLaVA-HR: High-Resolution Large Language-Vision Assistant
Apache License 2.0
202 stars 9 forks source link

the training parameters of your single branch convnext encoder #4

Open yuecao0119 opened 6 months ago

yuecao0119 commented 6 months ago

Hello, your job is so great.

But I would like to ask, is it convenient to disclose the training parameters of your single branch convnext encoder? I am not very able to understand the following part of the code.

def feature_select(self, image_forward_outs):
        if self.select_layer>100:
            image_features = image_forward_outs[-4:]
        else:
            image_features = image_forward_outs[-1]
        return image_features
luogen1996 commented 6 months ago

These codes of image_features = image_forward_outs[-4:] are not actually used. We directly select the last layer of ConvNeXT to extract visual features. We will revise our codes soon.

yuecao0119 commented 6 months ago

Thank you for your answer. How should the single-branch convnext in your paper be trained? Because I tried to use your code to train single-branch convnext, the loss effect in the pretrain stage was not very good. image

luogen1996 commented 6 months ago

Your loss looks actually good. Single-branch LLaVA-HR performs worse, see our paper. image