Open gaosh opened 3 weeks ago
@gaosh Thanks for pointing this out. Can you clarify what you mean? Currently, within the Hypernetwork, I see that there is already a linear layer after the GRU - https://github.com/sidhantls/adaptive-rank-selection-svd/blob/0963edebd85be5f23adeb556292e957d054c448d/utils/adaptive_rank_selection.py#L56. This linear layer is different for each hypernetwork. Do you mean there has to be an additional linear layer after this one?
The hypernetwork architecture in the code follows this architecture: GRU -> Layer Norm -> Activation -> Linear. This alligns with Table A.1 in Appendix B. Can you re-clarify what is missing, it will be very helpful
Hello, I check the code again. Seems the current implementation assigns a hypernetwork for each low rank linear layer as in this line https://github.com/sidhantls/adaptive-rank-selection-svd/blob/0963edebd85be5f23adeb556292e957d054c448d/utils/adaptive_rank_selection.py#L112 In the paper, we use a single hypernetwork for all low rank linear layers. Suppose you have L linear weight matrices, the input self.z for the hypernetwork will have shape (1, L, input_size), where L will be the sequence length for GRU. And we have L Linear layers after the outputs of GRU. I hope this clarifies the implementation of the hypernetwork. You can find an example from my another project here: https://github.com/xidongwu/AutoTrainOnce/blob/main/imgnet_models/hypernet.py#L81
Ah I see, I understand now. I had misinterpreted this.
"A single hyper network for all low-rank layers": The paper defines the hypernetwork as Bi-GRU → LayerNorm→ GeLU → Linear
. However, what you mean is that only the Bi-GRU → LayerNorm→ GeLU
portion of the hypernetwork is implemented once for all layers. And Linear
is unique for each linear layer?
Thanks for sharing this, it's helpful to ensure the reproducing of results is accurate. I'll update the repo with this implementation
Thanks for the feedback. I updated the implementation in this branch:
I found another issue compared to the original implementation. The Hypernetwork https://github.com/sidhantls/adaptive-rank-selection-svd/blob/main/utils/adaptive_rank_selection.py#L35 should have a separate linear layer for each layer, which will show a larger rank difference for different layers after learning.