I think that stage 1 learning, that means visual-language representation learning with those three objectives mentioned in the article is not yet implemented. Am I right?
Not implemented load_pretrained: False in pretrain_stage1.yaml, is only part of the problem. LLM is connected so it's rather Generative Learning from stage 2. Is there currently any way to properly perform stage 1 Vision-Language Representation bootstraping?
I think that stage 1 learning, that means visual-language representation learning with those three objectives mentioned in the article is not yet implemented. Am I right?
Not implemented
load_pretrained: False
inpretrain_stage1.yaml
, is only part of the problem. LLM is connected so it's rather Generative Learning from stage 2. Is there currently any way to properly perform stage 1 Vision-Language Representation bootstraping?