jpeddicord / askalono

A tool & library to detect open source licenses from texts
Apache License 2.0
256 stars 25 forks source link

Repeated optimization passes #25

Closed jpeddicord closed 6 years ago

jpeddicord commented 6 years ago

The optimize_bounds method of TextData is capable of isolating a window within input text to identify a single license chunk. It'd be nice to find multiple licenses within a file, in the case of dual licenses, etc.

My initial thought: A new method that uses optimize_bounds repeatedly; storing the results of the call and removing (or blanking out) the matched text from the original. Then another iteration that tries optimize_bounds again. Repeat until there's no identifiable text (above, say, 0.8 confidence).

jpeddicord commented 6 years ago

This is starting to happen via "strategies": 05bd6ac3d3bd9f7221395e126764e893f5a26208

jpeddicord commented 6 years ago

Closing this as strategies have landed in master. Should make it out with the next release soon. \o/