wenwei202 / iss-rnns

Sparse Recurrent Neural Networks -- Pruning Connections and Hidden Sizes (TensorFlow)
Apache License 2.0
73 stars 21 forks source link

The speed problem #3

Open zuowang opened 6 years ago

zuowang commented 6 years ago

Here is the command I used to train structure_grouplasso model and from_scratch model. The speed of from_scratch model is 392 wps, but the speed of structure_grouplasso model is 2 wps. And I also have a question about the paper: Why the ISS method has a similar speed with the direct design method? The ISS method will create a sparse large model where the zero weights also consumes CPU.

Thanks a lot!

python ptb_word_lm.py --model sparselarge --data_path simple-examples/data/ --config_file structure_grouplasso.json

python ptb_word_lm_heter.py --model large --data_path simple-examples/data/ --hidden_size1 373 --hidden_size2 315 --config_file from_scratch.json
python ptb_word_lm.py --model validtestlarge --data_path simple-examples/data/ --display_weights True --config_file structure_grouplasso.json --restore_path /tmp/2017-11-24___01-55-55

python ptb_word_lm_heter.py --model validtestlarge --data_path simple-examples/data/ --display_weights True --hidden_size1 373 --hidden_size2 315 --config_file from_scratch.json --restore_path /tmp/2017-11-23___10-33-44
wenwei202 commented 6 years ago

Note that the code is for inference acceleration by taking some training efforts.

After we learn which ISS components can be removed, we just need to throw those zeros away and initialize a small LSTM with learned nonzero weights for inference. It makes no sense to keep zeros, and this is the advantage of the method.

zuowang commented 6 years ago

Could you please tell me how to throw those zeros? Thanks a lot!

wenwei202 commented 6 years ago

When one ISS component is all zeroes, it means the hidden size of lstm is reduced by one. you just need to create a new lstm with a smaller size and initialize the weights by those nonzeroes.

RyanTang1 commented 6 years ago

Hello wenwei: So this means when we finish training the first time. We'll have to look at the parameters manually to see whether the ISS component is all zeros. Then we'll have to make a new lstm with smaller size based on what we observed. Is this correct?

wenwei202 commented 6 years ago

sort of

ShangwuYao commented 6 years ago

I am quite confused because after I read your paper, I thought the speedup is for the training phase, and you just use the trained model itself to do inference. But from your response to this issue, you mean you take the non-zero part of weight and use them as a pre-trained model to initialize a smaller model? And your code doesn't show how to save and load this pre-trained model, right? And how to deal with this newly initialized model? Did you fine-tuned it? If so, with what parameter? Thanks a lot.