srbehera11 / verjinxer

Automatically exported from code.google.com/p/verjinxer
0 stars 0 forks source link

Use iterators for q-grams #2

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
I would like to walk through q-grams of a byte[] or ByteBuffer by using an
iterator.

Version 1: The iterator always returns the q-gram code of the next q-gram,
-1 if is contains non-alphabet symbols.

Version 2: The iterator returns a pair (i, code) for increasing i, such
that code is the q-gram code at position i. Here positions with invalid
q-grams can be skipped. Positions with more than one q-gram (eg, bis.) can
be output several times, the details are handeled inside the iterator.

This will require major refactoring in QGramIndexer (and classes that use
q-grams).

Original issue reported on code.google.com by svenrahm...@gmail.com on 16 Feb 2008 at 11:31

GoogleCodeExporter commented 8 years ago
I have written simple iterators (that don't do proper q-gram code updating, but
compute the code from scratch) in QGramCoder.

To do:
- QGramIndexer needs to be rewritten to use these iterators.
- The iterators need to efficiently update the q-gram code if possible.
- The iterators need to support bis. treatment. (MultiQGramCoder?)

Unclear:
- How to integrate q-gram iteration with qseqfreq. This is not so important.

Original comment by svenrahm...@gmail.com on 16 Feb 2008 at 11:57