Open vlandham opened 8 years ago
This could be a separate page, or integrated in some way with the quickstart guide.
@arnicas has provided a great starting point for common tasks below. This could be used for inspiration:
Some recipes:
textkit text2words test_data/alice.txt | textkit filterpunc | more
textkit text2words test_data/alice.txt | textkit lowercase - | more
textkit text2words test_data/alice.txt | textkit filterpunc | textkit filterwords
textkit text2words test_data/alice.txt | textkit filterpunc | textkit filterwords --custom custom/stop.txt | more
textkit text2words test_data/alice.txt | textkit filterpunc | textkit filterwords --custom custom/stop.txt | textkit lowercase - | more
textkit text2words test_data/alice.txt | textkit filterpunc | textkit filterwords --custom custom/stop.txt | textkit tokens2counts | more
[I'm a little weirded out not getting counts out of these topbigrams results, or some other measure.]
textkit text2words test_data/alice.txt | textkit lowercase - | textkit filterpunc | textkit filterwords --custom custom/stop.txt | textkit topbigrams | more
textkit text2words test_data/alice.txt | textkit filterpunc | textkit lowercase - | textkit filterwords --custom custom/stop.txt | textkit words2bigrams | more
textkit text2words test_data/alice.txt | textkit filterpunc | textkit lowercase - | textkit filterwords --custom custom/stop.txt | textkit words2bigrams | textkit tokens2counts | more
textkit text2words test_data/alice.txt | textkit tokens2pos
textkit text2words test_data/alice.txt | textkit tokens2pos - | grep NNP
textkit text2words test_data/alice.txt | textkit tokens2pos - | grep NNP | textkit tokens2counts | more
This could be a separate page, or integrated in some way with the quickstart guide.
@arnicas has provided a great starting point for common tasks below. This could be used for inspiration:
Some recipes:
Tokenizing
textkit text2words test_data/alice.txt | textkit filterpunc | more
textkit text2words test_data/alice.txt | textkit lowercase - | more
Stopwords
textkit text2words test_data/alice.txt | textkit filterpunc | textkit filterwords
textkit text2words test_data/alice.txt | textkit filterpunc | textkit filterwords --custom custom/stop.txt | more
textkit text2words test_data/alice.txt | textkit filterpunc | textkit filterwords --custom custom/stop.txt | textkit lowercase - | more
Word counting
textkit text2words test_data/alice.txt | textkit filterpunc | textkit filterwords --custom custom/stop.txt | textkit tokens2counts | more
Bigrams
[I'm a little weirded out not getting counts out of these topbigrams results, or some other measure.]
textkit text2words test_data/alice.txt | textkit lowercase - | textkit filterpunc | textkit filterwords --custom custom/stop.txt | textkit topbigrams | more
textkit text2words test_data/alice.txt | textkit filterpunc | textkit lowercase - | textkit filterwords --custom custom/stop.txt | textkit words2bigrams | more
textkit text2words test_data/alice.txt | textkit filterpunc | textkit lowercase - | textkit filterwords --custom custom/stop.txt | textkit words2bigrams | textkit tokens2counts | more
POS
textkit text2words test_data/alice.txt | textkit tokens2pos
textkit text2words test_data/alice.txt | textkit tokens2pos - | grep NNP
textkit text2words test_data/alice.txt | textkit tokens2pos - | grep NNP | textkit tokens2counts | more