You use the 0-th vector for desc_emb. This is the label-wise attention with textual label description as an enhancement. If you do it in this may, you may also add a regularization term for the objective function as in Eq. 7 of the CAML paper.
In my experience, random initialization sometimes can achieve better performance. You may try the setting where randomly initialized vectors are used as hidden label representations. This belongs to a part of the ablation study.
https://github.com/quan-possible/med-text/blob/163c7866ecc77bf0d3a4f3c951994af2c8d5c87c/project/hoc.py#L132
You use the 0-th vector for
desc_emb
. This is the label-wise attention with textual label description as an enhancement. If you do it in this may, you may also add a regularization term for the objective function as in Eq. 7 of the CAML paper.In my experience, random initialization sometimes can achieve better performance. You may try the setting where randomly initialized vectors are used as hidden label representations. This belongs to a part of the ablation study.