Text analysis should not pick a component if it is not sure

blcham commented 1 year ago

I see this imported data from text analysis within line 3:

5133941	5114164	http://onto.fel.cvut.cz/ontologies/slovnik/slovník-komponent-a-zádavd---markéta-dp-lower-case/pojem/oxy-bottle	oxy bottle	0.5	oxy bottle; emergency cylinder	http://onto.fel.cvut.cz/ontologies/slovnik/slovník-komponent-a-zádavd---markéta-dp-lower-case/pojem/low-pressure	low pressure	1		0.5	FALSE	the emergency operation cylinder was found lower pressure . pls charge the cylinder bottle .	the emergency operation cylinder was found lower pressure . pls charge the cylinder bottle .	There one more than 2 annotations within this workstep.

Those are the annotations:

the <span about="_:20a7-11" property="ddo:je-výskytem-termu" resource="http://onto.fel.cvut.cz/ontologies/slovnik/slovník-komponent-a-zádavd---markéta-dp-lower-case/pojem/emergency-cylinder" typeof="ddo:výskyt-termu" score="0.5">emergency</span> operation <span about="_:20a7-12" property="ddo:je-výskytem-termu" resource="http://onto.fel.cvut.cz/ontologies/slovnik/slovník-komponent-a-zádavd---markéta-dp-lower-case/pojem/emergency-cylinder" typeof="ddo:výskyt-termu" score="0.5">cylinder</span> was found <span about="_:20a7-14" property="ddo:je-výskytem-termu" resource="http://onto.fel.cvut.cz/ontologies/slovnik/slovník-komponent-a-zádavd---markéta-dp-lower-case/pojem/low-pressure" typeof="ddo:výskyt-termu" score="1.0">lower pressure</span> .<br> pls charge the <span about="_:20a7-15" property="ddo:je-výskytem-termu" resource="http://onto.fel.cvut.cz/ontologies/slovnik/slovník-komponent-a-zádavd---markéta-dp-lower-case/pojem/emergency-cylinder" typeof="ddo:výskyt-termu" score="0.5">cylinder</span> <span about="_:20a7-16" property="ddo:je-výskytem-termu" resource="http://onto.fel.cvut.cz/ontologies/slovnik/slovník-komponent-a-zádavd---markéta-dp-lower-case/pojem/oxy-bottle" typeof="ddo:výskyt-termu" score="0.5">bottle</span> .

I believe there is a mistake in selecting component "oxy bottle" because it has same score as "emergency cylinder". I believe we discussed that and decided that when we do not know to chose we will not pick any component or failure.

Matthew-Kulich commented 1 year ago

Yes, we discussed it, but I think this was generated before the discussion. I will regenerate it after text-analysis is finished

blcham commented 1 year ago

Moreover, I suggest to change columns in the output data: 1) rename column MultipleComponents to FoundComponentLabels (change of semantics here -- it should be filled in even there is only one component here) 2) rename column MultipleFailures --> FoundFailureLabels (see explanation above) 3) add column FoundComponentsCount (count of components above) 4) add column FoundFailuresCount (count of failures above) 5) add column SelectedComponentLabels (in case the score is same and we don't have rule to select one component, we return here multiple) 6) add column SelectedFailureLabels (see explanation above) 7) add column SelectedComponentsCount 8) add column SelectedFailuresCount 9) rename column ComponentScore --> SelectedComponentsScore 10) rename column FailureScore --> SelectedFailuresScore 11) remove column ComponentLabel 12) remove column FailureLabel

Matthew-Kulich commented 1 year ago

Ok, should I change it now? Or commit changes and finish the script and change it later?

blcham commented 1 year ago

Finish the script, regenerate data and put it to the google sheet file as new tab.

Matthew-Kulich commented 1 year ago

Done, it is in the new tab Regenerated raw data here

kbss-cvut / aircraft-maintenance-planning-system

Text analysis should not pick a component if it is not sure #177