To count tokens, use a word tokenizer in `wordview.text_analysis.core.do_txt_analysis`

Description

Currently in wordview.text_analysis.core.do_txt_analysis tokens are extracted by splitting the text around space. Improve this by using a tokenizer. E.g. nltk word tokenizer.

Solution:

for text in tqdm(df["review"]):
    try:
        sentences = sent_tokenize(text.lower())
        for sentence in sentences:
            sentence_tokens = word_tokenize(sentence)
            num_tokens += len(sentence_tokens)
    except Exception as e:
        print("Processing entry --- %s --- lead to exception: %s" % (text, e.args[0]))
        continue

meghdadFar / wordview

To count tokens, use a word tokenizer in `wordview.text_analysis.core.do_txt_analysis` #144

Description

Solution: