Working with uniformly initialized Linear layers

phelps-matthew / FeatherMap

Implementation of "Structured Multi-Hashing for Model Compression" (CVPR 2020)

MIT License

11 stars 3 forks source link

Working with uniformly initialized Linear layers #14

Open varun19299 opened 3 years ago

varun19299 commented 3 years ago

Hi, I'm trying to get feather map working with SIREN, which uses uniformly initialized Linear layers (with varying scales).

Does FeatherMap support only fan in at the moment?

phelps-matthew commented 3 years ago

If you take a look at the __norm_V method within class FeatherNet

    def __norm_V(self) -> None:
        """Normalize global weight matrix. Currently implemented only for uniform
        intializations"""
        # sigma = M**(-1/4); bound follows from uniform dist.
        bound = sqrt(12) / 2 * (self._size_m ** (-1 / 4))
        torch.nn.init.uniform_(self._V1, -bound, bound)
        torch.nn.init.uniform_(self._V2, -bound, bound)

one can see that, at present, only uniform initializations are supported.

phelps-matthew commented 3 years ago

Yes, also looks like fan-in is only supported as found in __unregister_params

varun19299 commented 3 years ago

Thanks! I got it to work by suitably modifying scaler to reflect SIREN's initialisation. Its not perfect, but works.

Seems like the low rank decomposition, even with no compression impacts training dynamics (performance drop). Not surprising, since the task isn't classification & performance is measured by PSNR.

phelps-matthew commented 3 years ago

Yeah, just by setting compression = 1.0 (which still expresses weights as a matrix product, albeit one of the same size as is uncompressed) will cause a drop in performace. This is due to, I think, just an inefficient structuring of weights (nonlinearly coupled).

After experiments and characterization, I've found that the FeatherMap package is really designed to shine at very very deep compressions. Its in this range you can get the largest delta_size/delta_performance