luogen1996 / RepAdapter

Official implementation of "Towards Efficient Visual Adaption via Structural Re-parameterization".
199 stars 24 forks source link

Have you tried a bigger hidden embed dim? #3

Closed leexinhao closed 1 year ago

leexinhao commented 1 year ago

image

I notice that in your paper bigger embed dim don't necessarily work better, but it (16) still very small compared to other work (AdaptFormer is 64, AIM is 256). As I understand it, larger dimensions only increase the burden of training and not the burden of reasoning due to structural re-parameterization, so maybe try a larger embed dim can lead to better performance without loss of efficiency.

luogen1996 commented 1 year ago

image

I notice that in your paper bigger embed dim don't necessarily work better, but it (16) still very small compared to other work (AdaptFormer is 64, AIM is 256). As I understand it, larger dimensions only increase the burden of training and not the burden of reasoning due to structural re-parameterization, so maybe try a larger embed dim can lead to better performance without loss of efficiency.

Dimension sizes seem to depend on the task and dataset. On VTAB-1K, larger dimensions (>8) will degenerate the performance. On video classification, we use a larger dimension (16), and achieve better results than the small one (2 and 8).

leexinhao commented 1 year ago

image I notice that in your paper bigger embed dim don't necessarily work better, but it (16) still very small compared to other work (AdaptFormer is 64, AIM is 256). As I understand it, larger dimensions only increase the burden of training and not the burden of reasoning due to structural re-parameterization, so maybe try a larger embed dim can lead to better performance without loss of efficiency.

Dimension sizes seem to depend on the task and dataset. On VTAB-1K, larger dimensions (>8) will degenerate the performance. On video classification, we use a larger dimension (16), and achieve better results than the small one (2 and 8).

Have you ever tried a larger dim like 64 or 128?

luogen1996 commented 1 year ago

image I notice that in your paper bigger embed dim don't necessarily work better, but it (16) still very small compared to other work (AdaptFormer is 64, AIM is 256). As I understand it, larger dimensions only increase the burden of training and not the burden of reasoning due to structural re-parameterization, so maybe try a larger embed dim can lead to better performance without loss of efficiency.

Dimension sizes seem to depend on the task and dataset. On VTAB-1K, larger dimensions (>8) will degenerate the performance. On video classification, we use a larger dimension (16), and achieve better results than the small one (2 and 8).

Have you ever tried a larger dim like 64 or 128?

No, i think the dim of 64 or 128 will perform worse than 8 on VTAB-1K, but probably better on video classification.

leexinhao commented 1 year ago

Thanks for your reply and nice work!