The option s_decay is a bit like a weight-decay term that empirically is helpful for smaller datasets. We use a default of 0.01 in all our experiments. For larger datasets, smaller values (even 0.0) often worked as well.
What does small and large dataset mean in this context?
What does small and large dataset mean in this context?