Closed irinakhismatullina closed 5 years ago
It doesn't affect performance significantly afaik, but this way the distribution for sample is more fair, like more correct. Actually this division by the number of tokens makes sense only for some data format, that is used by default, and probably it should be made a function argument. WDYT?
let's put it to config
.