ucbrise / actnn

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
MIT License
201 stars 30 forks source link

Does this work for any model with activation function as relu? #34

Open kailashg26 opened 2 years ago

kailashg26 commented 2 years ago

Hello, I'm trying to use the actnn with maddpg (an MARL algorithm). The model just has 3 layers with activation function relu. If so, can you let us know if this mechanism will give the results with smaller models.

Thank you.

Link of maddpg https://github.com/marlbenchmark/off-policy/tree/release/offpolicy/algorithms/maddpg

merrymercy commented 2 years ago

It should work with relu activation function, but we didn't test any RL tasks. Did you meet memory issues even with this small model?

kailashg26 commented 2 years ago

I mean, when I train the maddpg, almost 10GB of memory gets used. So, I wanted to try some compression functions. It would be a great help if you can provide some insights on how to test it with the maddpg if you have any idea.

Thanks

merrymercy commented 2 years ago

You can try to follow the usage and replace the layers in your model with actnn layers. You can start with higher bits and see whether the lossy compression hurts reward.

kailashg26 commented 2 years ago

Thank you. I'll look into it and post here if I have any doubts