src-d / style-analyzer

Lookout Style Analyzer: fixing code formatting and typos during code reviews
GNU Affero General Public License v3.0
32 stars 21 forks source link

Add notebook with twitter dataset analysis #695

Closed irinakhismatullina closed 5 years ago

irinakhismatullina commented 5 years ago

Very brief look at the dataset.

  1. Looking at its vocabulary and its intersection with out code identifiers dataset.
  2. Training and evaluating the TyposCorrector on this dataset, using our trained on identifiers fasttext model. Print mistakes made by the model.