usnistgov / nestor-tmp2

Quantifying tacit knowledge for investigatory analysis
Other
9 stars 5 forks source link

Unknowns not being parsed by keyword extractor #5

Closed rtbs-dev closed 6 years ago

rtbs-dev commented 6 years ago

E.g., here's the three untagged work orders (tag-empty):

RawText Items Problem Solution eXcess Redundant UK_tok
631 unload automation not return unload, automation
2562 camshaft standstill gary camshaft
3185 disti water empty water fill disti, fill

return is definitely in the vocab list, as "U" (found via the <KeywordExtractor>.vocab() attr.) So, why isn't it showing up?

@saschaMoccozet maybe the .transform() method is broken again?

saschaMoccozet commented 6 years ago

did it found it before we change the code ? (when there is 2 time the checking )

rtbs-dev commented 6 years ago

it turns out it was that .trasform func, which wasn't even keeping thinks marked as "U", only things that were NOT it in the vocab list to start with....I think we need two separate labels for those two types of unknowns.?

rtbs-dev commented 6 years ago

Solved with this commit