src-d / style-analyzer

Lookout Style Analyzer: fixing code formatting and typos during code reviews
GNU Affero General Public License v3.0
32 stars 21 forks source link

Change fasttext training config #772

Closed irinakhismatullina closed 5 years ago

irinakhismatullina commented 5 years ago
  1. Change in the model parameters.
  2. No corruption of training data. Corruption sometimes improves the quality, but not dramatically. Without it the model size is much lower.
zurk commented 5 years ago

can you give some insights about why do you change this number?
I believe the new parameters improve quality in one of the reports?

irinakhismatullina commented 5 years ago

The most important change is corrupt=False. As I said above, it's change significantly reduces the model's size, and there's almost no quality drop. The other ones are for the quality improvement in this setup in comparison with the default one with only corrupt changed to False.

I can't promise that it's the best configuration, but it's the one I was happy with after a series of experiments, and the one I used in all my training experiments, and in the final model. So mostly they are exactly the same to reproduce my results.