kermitt2 / kish

Keeping It Simple is Hard
9 stars 5 forks source link

auto-code matching strings? #9

Open jameshowison opened 1 year ago

jameshowison commented 1 year ago

Might it make sense when some codes the name of a piece of software to automatically locate all occurances, mark them as draft annotations of software, and ask the user to validate those?

Less important if the system has auto-coded already, since I think that is already a step. But in that case perhaps if the annotator devalidates a candidate annotation then we could immediately present all matched ones?

kermitt2 commented 1 year ago

The system - the machine learning tool pre-annotating) - has indeed such auto-matching. I call it propagation, if a software name string is found somewhere, it is propagated to the other occurrence of the string in the same document, with the condition that the string is rare enough (above a tf/idf threshold).

But I think it's a good idea to have it also based on human annotator action, For example an optional human annotation propagation as additional "pre-annotations", with a check box to trigger this propagation or not.

Complementary or as an alternative to this, a search function in the current document would be very useful and could be launched from an existing annotation (for example a button "search other occurrence of this string").