Closed LinguList closed 4 years ago
Regarding ngrams
, I'm not sure this is needed considering that it's rather short to implement:
def ngrams(l):
for i in reversed(range(len(l))):
for j in range(len(l) - i):
yield l[j:j+i+1]
> list(ngrams(list('abcdefg')))
[['a', 'b', 'c', 'd', 'e', 'f', 'g'], ['a', 'b', 'c', 'd', 'e', 'f'], ['b', 'c', 'd', 'e', 'f', 'g'], ['a', 'b', 'c', 'd', 'e'], ['b', 'c', 'd', 'e', 'f'], ['c', 'd', 'e', 'f', 'g'], ['a', 'b', 'c', 'd'], ['b', 'c', 'd', 'e'], ['c', 'd', 'e', 'f'], ['d', 'e', 'f', 'g'], ['a', 'b', 'c'], ['b', 'c', 'd'], ['c', 'd', 'e'], ['d', 'e', 'f'], ['e', 'f', 'g'], ['a', 'b'], ['b', 'c'], ['c', 'd'], ['d', 'e'], ['e', 'f'], ['f', 'g'], ['a'], ['b'], ['c'], ['d'], ['e'], ['f'], ['g']]
get_all_posngrams
seems a lot more powerful. So I'd rather just not add such a function here.
Just thought about ngram functions. They are basically all easy to implement, also bi, trigrams, and the like. And they are not necessarily needed by now, it would rather be handy to have them in some place, for developing new experiments and algortithms. If needed, one could add ngram functions in a specific ngram module of linse, I think, since they are a specific way of manipulation that one recognizes as something specific.
So in my opinion, we can drop this for the time being and mark this closed.
transform or manipulate makes another sequence out of a given sequence
+
)And maybe some of the ngram functions, but they are also rather specific, I think.