Dear @wenwei202 ,
I found training speed get much much slower with group lasso than without it. I believe you must had the same experience. I hacked your code and found this line should be responsible for this efficiency decline(about 30%). I replace this kind of dynamic inquery to device, which is proved to be time-comsuming, with simply macro CAFFE_CUDA_NUM_THREADS. The training speed is now comparable to previous no group lasso one.
Dear @wenwei202 , I found training speed get much much slower with group lasso than without it. I believe you must had the same experience. I hacked your code and found this line should be responsible for this efficiency decline(about 30%). I replace this kind of dynamic inquery to device, which is proved to be time-comsuming, with simply macro CAFFE_CUDA_NUM_THREADS. The training speed is now comparable to previous no group lasso one.
Hope this helpful. Thank you