src-d / style-analyzer

Lookout Style Analyzer: fixing code formatting and typos during code reviews
GNU Affero General Public License v3.0
32 stars 21 forks source link

Refactor training in typos #775

Closed irinakhismatullina closed 5 years ago

irinakhismatullina commented 5 years ago

Change train dataset generation schema.

Instead of leaving only identifiers with tokens inside the vocabulary for training, we now use the given number of the most frequent identifiers.

This way we still remove a lot of noise, and also have less blind dependency on the vocabulary.

vmarkovtsev commented 5 years ago

@irinakhismatullina ./lookout/style/typos/preparation.py:17:1: F401 'lookout.style.typos.utils.filter_splits' imported but unused