hyperbolic cosine based estimator

Hi @lucidrains thanks a lot for providing this implementation, The authors of the paper proposed two types of estimators for Positive Random Features: the one based on exponential functions (referred to as SM+), and another one based on cosh, referred to as SM hyp+ But I think in the jax/TF implementation, and thus in this repository as well, the implementation is provided for the exponential one only. In the paper, the authors mention that "Furthermore, the hyperbolic estimator provides additional accuracy improvements that are strictly better than those from SM+ 2m(x, y) with twice as many random features." So it seems like the default choice should have been the cosh based estimator, but it is not. Would you happen to have more insights into this? Also does the ortho_scaling=1 option switch on the regularized softmax-kernel (SMREG)? Is it recommended to use that anywhere? The authors have mentioned that ortho_scaling = 0.0 as the default hyperparameter choice though.

lucidrains / performer-pytorch

hyperbolic cosine based estimator #73