phelps-matthew / FeatherMap

Implementation of "Structured Multi-Hashing for Model Compression" (CVPR 2020)
MIT License
11 stars 3 forks source link

🕊 FeatherMap

What is FeatherMap?

FeatherMap is a tool that compresses deep neural networks. Centered around computer vision models, it implements the Google Research paper Structured Multi-Hashing for Model Compression (CVPR 2020). Taking the form of a Python package, the tool takes a user-defined PyTorch model and compresses it to a desired factor without modification to the underlying architecture. Using its simple API, FeatherMap can easily be applied across a broad array of models.

Table of Contents

Installation

base_model = ResNet34() model = FeatherNet(base_model, compress=0.10)

Forward pass ...

y = model(x) loss = criterion(y, target) ...

Backward and optimize ...

loss.backward() optimizer.step() ...

See `feathermap/models/` for a zoo of available CV models to compress.
### Training
Models are trained on CIFAR-10 using `feathermap/train.py` (defaults to training ResNet-34). See the argument options by using the help flag `--help`.
```bash
python train.py --compress 0.1

Deployment

Upon defining your FeatherNet model, switch to deploy mode to calculate weights on the fly (see What is Structured Multi-Hashing?).

base_model = ResNet34()
model = FeatherNet(base_model, compress=0.10)
model.deploy()

Results

Below are results as applied to a ResNet-34 architecture, trained and tested on CIFAR-10. Latency benchmarked on CPU (AWS c5a.8xlarge) iterating over 30k images with batch size of 100. To add some context, one can compress ResNet-34 to 2% of its original size while still achieving over 90% accuracy (a 5% accuracy drop compared to the base model), while incurring only a 4% increase in inference time.

What is Structured Multi-Hashing?

There are two main concepts behind structured multi-hashing. The first concept is to take the weights of each layer, flatten them, and tile them into a single square matrix. This global weight matrix represents the weights of the entire network.

The second concept is purely linear algebra and it is the understanding that if we take a pair of columns and matrix-multiply them by a pair of rows, we obtain a square matrix.

Putting these two ideas together, we can implement structured multi-hashing! Here's how it works:

  1. Let the total number of tunable parameters describing the entire network be the set of two rows (2 x n) and two columns (n x 2)
  2. Matrix multiply the columns and rows to obtain a square matrix of size (n x n)
  3. Map each element of the matrix above to each element in the global weight matrix

Putting it all together, we have this process.

What we have effectively done with this mapping is a reduction of the number of tunable parameters from n^2 to 4n, thus achieving the desired compression!

Additional Remarks: