src-d / ml

sourced.ml is a library and command line tools to build and apply machine learning models on top of Universal Abstract Syntax Trees
Other
141 stars 44 forks source link

Create terms glossary for sourced.ml #271

Open zurk opened 6 years ago

zurk commented 6 years ago

We constantly confuse terms, so what to say about other developers. I do not want to make it full, but to have a start.

Here is terms list to explain on the first iteration:

  1. Bag-of-words
  2. Weighted bag-of-words
  3. Model
  4. Algorithm
  5. Transformer
  6. Document
  7. Features
    1. identifier
    2. token
    3. literal
    4. graphlet

Googleable terms we may comment:

  1. quantization
  2. TF-IDF
  3. topic
  4. co-occurrence matrix

@src-d/machine-learning please take a look and add any confusing terms you remember.

r0mainK commented 6 years ago

If we're gonna define identifiers and token, might as well also add literals, graphlets and also ~quantification~ quantization . I think we could divide the glossary into:

vmarkovtsev commented 6 years ago

Linking to https://github.com/src-d/apollo/blob/master/doc/GLOSSARY.md

zurk commented 6 years ago

Thanks, @r0mainK I update the description.