src-d / ml is a library and command line tools to build and apply machine learning models on top of Universal Abstract Syntax Trees
141 stars 44 forks source link

Create terms glossary for #271

Open zurk opened 6 years ago

zurk commented 6 years ago

We constantly confuse terms, so what to say about other developers. I do not want to make it full, but to have a start.

Here is terms list to explain on the first iteration:

  1. Bag-of-words
  2. Weighted bag-of-words
  3. Model
  4. Algorithm
  5. Transformer
  6. Document
  7. Features
    1. identifier
    2. token
    3. literal
    4. graphlet

Googleable terms we may comment:

  1. quantization
  2. TF-IDF
  3. topic
  4. co-occurrence matrix

@src-d/machine-learning please take a look and add any confusing terms you remember.

r0mainK commented 6 years ago

If we're gonna define identifiers and token, might as well also add literals, graphlets and also ~quantification~ quantization . I think we could divide the glossary into:

vmarkovtsev commented 6 years ago

Linking to

zurk commented 6 years ago

Thanks, @r0mainK I update the description.