Open zuowang opened 6 years ago
Note that the code is for inference acceleration by taking some training efforts.
After we learn which ISS components can be removed, we just need to throw those zeros away and initialize a small LSTM with learned nonzero weights for inference. It makes no sense to keep zeros, and this is the advantage of the method.
Could you please tell me how to throw those zeros? Thanks a lot!
When one ISS component is all zeroes, it means the hidden size of lstm is reduced by one. you just need to create a new lstm with a smaller size and initialize the weights by those nonzeroes.
Hello wenwei: So this means when we finish training the first time. We'll have to look at the parameters manually to see whether the ISS component is all zeros. Then we'll have to make a new lstm with smaller size based on what we observed. Is this correct?
sort of
I am quite confused because after I read your paper, I thought the speedup is for the training phase, and you just use the trained model itself to do inference. But from your response to this issue, you mean you take the non-zero part of weight and use them as a pre-trained model to initialize a smaller model? And your code doesn't show how to save and load this pre-trained model, right? And how to deal with this newly initialized model? Did you fine-tuned it? If so, with what parameter? Thanks a lot.
Here is the command I used to train
structure_grouplasso
model andfrom_scratch
model. The speed offrom_scratch
model is 392 wps, but the speed ofstructure_grouplasso
model is 2 wps. And I also have a question about the paper: Why theISS
method has a similar speed with thedirect design
method? TheISS
method will create a sparse large model where the zero weights also consumes CPU.Thanks a lot!