scottkleinman / lexos

Development repo for the Lexos API
MIT License
1 stars 0 forks source link

Make milestones its own module #27

Open scottkleinman opened 1 year ago

scottkleinman commented 1 year ago

lexos.cutter.milestones is useful for other procedures like Rolling Windows. It might be a good idea to make milestones its own module (i.e. lexos.milestones), which can be imported into other modules.

After some further consideration, lexos.milestones only really works on identifying patterns within doc tokens. It cannot tag span patterns like, for instance, "Chapter 1". For this, it would be necessary to do pattern matching to identify a span and tag all members with token._.is_milestone. But that might not be useful. It might be better to use an IOB system like spaCy named entities.

Regardless, it should be possible to identify spans and mark their start tokens as milestones. This is effective in my current experimental code for the Rolling Windows module. That code may be useful as a method added to the milestones.Milestones class.