Hi
I saw that gradient clipping from non-parallel version of lstm code. (i.e. bilstm-layer.h)
But I cannot see corresponding part in parallel version of lstm. (i.e. bilstm-parallel-layer.h)
Although training seems fine for almost all cases ( I tried with several different size architecture on swbd), I wonder whether there is some reason you did not include clipping in parallel version lstm.
Hi I saw that gradient clipping from non-parallel version of lstm code. (i.e. bilstm-layer.h) But I cannot see corresponding part in parallel version of lstm. (i.e. bilstm-parallel-layer.h) Although training seems fine for almost all cases ( I tried with several different size architecture on swbd), I wonder whether there is some reason you did not include clipping in parallel version lstm.