santi-pdp / quasi-rnn

Quasi-RNN for language modeling
58 stars 17 forks source link

Performance #2

Open zhang-jian opened 7 years ago

zhang-jian commented 7 years ago

Hi, Using PTB data, the PPLs I got is Training: 91.31 Valid: 124.51 Test: 124.12

However, PPLs are much lower (Table 2, https://arxiv.org/pdf/1611.01576.pdf). Just wondering if you have investigated it?

Thanks,

Jian

santi-pdp commented 7 years ago

Hi,

I have not delved into it, and left this a bit behind actually. I should try the sentiment analysis and kernelize the QRNN layer in CUDA to actually take advantage of the x16 speed up, because this QRNN is just using the built-in convolutions and RNN pooling, which will have a lower performance in comparison to the paper version.

Regards

saxenauts commented 7 years ago

@santi-pdp Hey, I have the same problem. I want to kernelize QRNN to reproduce the results. However, I don't know how to kernelize it. Could you share any tutorial to do this, and then implement in tensorflow? I'd like to work on it.

Thanks

santi-pdp commented 7 years ago

Kernelizing it means implementing the actual CUDA kernel to make the x16 speed boost. I'd really like seeing a TF or PyTorch compiled version of the kernel working with some TF/PyTorch API wrapper. Bradbury et al. released the kernel in a blogpost, where the kernel is in the cuda.raw(...) snippet. Do you mean you need a tutorial on CUDA?

saxenauts commented 7 years ago

Yeah, the kernel script is there, but I need a good tutorial on how to bind it with tensorflow. Just like you, even I don't have time to learn all the basics. There is a loose tutorial on adding a tensorflow op. I'll look into it. I am searching for better tutorials. If you happen to find any, please let me know.

lsq357 commented 6 years ago

@santi-pdp @zhang-jian Can you show me how long to train in the same epochs/steps and same layers of LSTM and QRNN? And how many times QRNN faster than LSTM in training stage using this source code? Thanks!

santi-pdp commented 6 years ago

There won't be the original paper speedup (x16) suposedly, since I didn't write the custom CUDA kernel. I haven't check any speed difference b/w QRNN and LSTM in this project, sorry. It was made for learning purpose. You can find a PyTorch implementation, with CUDA kernel included, from the paper authors in https://github.com/salesforce/pytorch-qrnn