src-d / style-analyzer

Lookout Style Analyzer: fixing code formatting and typos during code reviews
GNU Affero General Public License v3.0
32 stars 21 forks source link

Add filtered dataset and code to do the filtering #763

Closed irinakhismatullina closed 5 years ago

irinakhismatullina commented 5 years ago
  1. Filter items, where token splits of the wrong and the correct identifiers are equal (they differ in non-letter symbols or case spelling).
  2. Filter items, where wrong and correct identifiers are equal on lemmas level.

The result is obtained with the updated TokenParser, which is not included in the last src-d/ml release. Therefore, to have results reproducible by default, we either need to make a release, or load the package from the github on setup.

irinakhismatullina commented 5 years ago

Btw src-d/ml-core has old TokenParser, so right now it cannot be used.

zurk commented 5 years ago

Ok, I will update it.

zurk commented 5 years ago

PR with an update: https://github.com/src-d/ml-core/pull/13