src-d / style-analyzer

Lookout Style Analyzer: fixing code formatting and typos during code reviews
GNU Affero General Public License v3.0
32 stars 21 forks source link

Small change in fasttext training data generation #784

Closed irinakhismatullina closed 5 years ago

irinakhismatullina commented 5 years ago
irinakhismatullina commented 5 years ago

It doesn't affect performance significantly afaik, but this way the distribution for sample is more fair, like more correct. Actually this division by the number of tokens makes sense only for some data format, that is used by default, and probably it should be made a function argument. WDYT?

zurk commented 5 years ago

let's put it to config.