weka511 / nlp

My experiments with Natural Language Processing. I've created a few programs to try out concepts.
GNU General Public License v3.0
1 stars 0 forks source link

Are frequencies accurate when we create examples? #24

Closed weka511 closed 1 year ago

weka511 commented 1 year ago

I suspect that the earlier examples have distorted probabilities, as we have read just a few tokens.

n = 1   # Count examples
for sentence in extract_sentences(extract_tokens(read_text(file_names = docnames))):
     indices = vocabulary.parse(sentence)
     for word,context,y in word2vec.generate_examples([indices],tower):
          examples.writerow([word,context,y])
          n += 1