Retrieval accuracy different from official JAX/FLAX implementation

mlpen / Nystromformer

Apache License 2.0

356 stars 41 forks source link

Hi, sorry for the late response. We have actually asked the authors of LRA about this issue, but the problem is not completely resolved. https://github.com/google-research/long-range-arena/issues/18 We suspect that the difference in hyper-parameters might be one of the reasons. However, when I checked the latest repo of LRA a few minutes ago, we are still not clear what hyper-parameters are exactly and how baselines are compared in the original paper. We used the data processing code in LRA repo and only rewrote the implementation for the model and training. So, the answer is still not clear.

mlpen / Nystromformer

Retrieval accuracy different from official JAX/FLAX implementation #11