Refactor training in typos

src-d / style-analyzer

Lookout Style Analyzer: fixing code formatting and typos during code reviews

GNU Affero General Public License v3.0

32 stars 21 forks source link

Closed irinakhismatullina closed 5 years ago

irinakhismatullina commented 5 years ago

Change train dataset generation schema.

Instead of leaving only identifiers with tokens inside the vocabulary for training, we now use the given number of the most frequent identifiers.

This way we still remove a lot of noise, and also have less blind dependency on the vocabulary.

vmarkovtsev commented 5 years ago

@irinakhismatullina ./lookout/style/typos/preparation.py:17:1: F401 'lookout.style.typos.utils.filter_splits' imported but unused