Closed aliencaocao closed 3 months ago
Seeing the config file linked https://github.com/shenyunhang/APE/blob/main/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitt_eva02_vlf_lsj1024_cp_16x4_1080k_mdl.py
Why is it using models/QuanSun/EVA-CLIP/EVA02_CLIP_E_psz14_plus_s9B.pt? isnt that the largest EVA?
And what do you refer to by "6Million" backbone?
We only use the text tower in models/QuanSun/EVA-CLIP/EVA02_CLIP_E_psz14_plus_s9B.pt as language model. And the ViT-Ti backbone is imported in this line.
models/QuanSun/EVA-CLIP/EVA02_CLIP_E_psz14_plus_s9B.pt
I see, thanks
Seeing the config file linked https://github.com/shenyunhang/APE/blob/main/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitt_eva02_vlf_lsj1024_cp_16x4_1080k_mdl.py
Why is it using models/QuanSun/EVA-CLIP/EVA02_CLIP_E_psz14_plus_s9B.pt? isnt that the largest EVA?
And what do you refer to by "6Million" backbone?