lexos.cutter.milestones is useful for other procedures like Rolling Windows. It might be a good idea to make milestones its own module (i.e. lexos.milestones), which can be imported into other modules.
After some further consideration, lexos.milestones only really works on identifying patterns within doc tokens. It cannot tag span patterns like, for instance, "Chapter 1". For this, it would be necessary to do pattern matching to identify a span and tag all members with token._.is_milestone. But that might not be useful. It might be better to use an IOB system like spaCy named entities.
Regardless, it should be possible to identify spans and mark their start tokens as milestones. This is effective in my current experimental code for the Rolling Windows module. That code may be useful as a method added to the milestones.Milestones class.
lexos.cutter.milestones
is useful for other procedures like Rolling Windows. It might be a good idea to make milestones its own module (i.e.lexos.milestones
), which can be imported into other modules.After some further consideration,
lexos.milestones
only really works on identifying patterns within doc tokens. It cannot tag span patterns like, for instance, "Chapter 1". For this, it would be necessary to do pattern matching to identify a span and tag all members withtoken._.is_milestone
. But that might not be useful. It might be better to use an IOB system like spaCy named entities.Regardless, it should be possible to identify spans and mark their start tokens as milestones. This is effective in my current experimental code for the Rolling Windows module. That code may be useful as a method added to the
milestones.Milestones
class.