Hello, @yooper
Based on the Ngram Statistics Package by Ted Pedersen and Satanjeev Banerjee I implemented some features for the Ngram functionalities of this library.
I fixed the separator insertion when the ngram is created with a separator with length bigger than one;
I implemented a function to calculate the frequency of each ngram inside of a ngram array and its tokens. The frequency is based on Pedersen and Banerjee's package as follows:
For bigrams, it calculates the frequency of the bigram as a whole and the frequencies of the right and left token in its found positions.
For trigrams, it calculates the frequency of the trigram as a whole, the frequencies of each token in its found positions, the frequency of the first token with the second token, the frequency of the first token with the third token and the frequency of the second token with the third token, all in its found positions.
Finally, I implemented calculations for statistic measures that determine the degree of association. Also, based on Pedersen and Banerjee's package.
Hello, @yooper Based on the Ngram Statistics Package by Ted Pedersen and Satanjeev Banerjee I implemented some features for the Ngram functionalities of this library.
There is a much more detailed description of Pedersen and Banerjee's package at their paper, available at: http://www.d.umn.edu/~tpederse/Pubs/cicling2003-2.pdf
Feel free to contact me in case of questions.