shenyunhang / APE

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
https://arxiv.org/abs/2312.02153
Apache License 2.0
476 stars 29 forks source link

APE-Ti 6M backbone but using EVA-02 9B? #45

Closed aliencaocao closed 3 months ago

aliencaocao commented 4 months ago

Seeing the config file linked https://github.com/shenyunhang/APE/blob/main/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitt_eva02_vlf_lsj1024_cp_16x4_1080k_mdl.py

Why is it using models/QuanSun/EVA-CLIP/EVA02_CLIP_E_psz14_plus_s9B.pt? isnt that the largest EVA?

And what do you refer to by "6Million" backbone?

shenyunhang commented 3 months ago

We only use the text tower in models/QuanSun/EVA-CLIP/EVA02_CLIP_E_psz14_plus_s9B.pt as language model. And the ViT-Ti backbone is imported in this line.

aliencaocao commented 3 months ago

I see, thanks