The optimize_bounds method of TextData is capable of isolating a window within input text to identify a single license chunk. It'd be nice to find multiple licenses within a file, in the case of dual licenses, etc.
My initial thought:
A new method that uses optimize_bounds repeatedly; storing the results of the call and removing (or blanking out) the matched text from the original. Then another iteration that tries optimize_bounds again. Repeat until there's no identifiable text (above, say, 0.8 confidence).
The
optimize_bounds
method ofTextData
is capable of isolating a window within input text to identify a single license chunk. It'd be nice to find multiple licenses within a file, in the case of dual licenses, etc.My initial thought: A new method that uses optimize_bounds repeatedly; storing the results of the call and removing (or blanking out) the matched text from the original. Then another iteration that tries optimize_bounds again. Repeat until there's no identifiable text (above, say, 0.8 confidence).