presciencelabs / tabitha-editor

0 stars 0 forks source link

"results and "sexing" are not found #37

Open longrunningprocess opened 6 months ago

longrunningprocess commented 6 months ago

reported by Richard

image

image

image

longrunningprocess commented 6 months ago

both of these words are in the Ontology but not in the Lexicon... more research is needed to determine if there are other words like this and what process is needed to get all of them.

longrunningprocess commented 6 months ago

result might just be a delay in data getting into the English project.

sex, however, might be a deeper issue related to the Analyzer's data and the English project... "sex" becomes "sleeps with" so I need to figure out if our source for word forms, i.e., inflections, will ever be complete when coming from the English project.

longrunningprocess commented 6 months ago

After a conversation with Tod about this it turns out a combination of words forms from both the English project AND the Analyzer project would be needed to have a comprehensive list of word forms.

I ran the verbs just as an exercise and here are the diffs. < shows what's in the English project, > shows what's in the Analyzer project.

diff verbs_from_english.csv verbs_from_analyzer.csv

8d7
< affect,Verb,|affected|affected|affecting|affects|
23d21
< assist,Verb,|assisted|assisted|assisting|assists|
41c39
< betray,Verb,|betrayed|betrayed|betraying|betrays|
---
> betray,Verb,|betrayed|betrayed|betraying|betraies|
67c65
< cancel,Verb,|cancelled|cancelled|cancelling|cancels|
---
> cancel,Verb,|canceled|canceled|canceling|cancels|
72d69
< cast,Verb,|cast|cast|casting|casts|
135d131
< devour,Verb,|devoured|devoured|devouring|devours|
152d147
< dwell,Verb,|dwelled|dwelled|dwelling|dwells|
156d150
< embalm,Verb,|embalmed|embalmed|embalming|embalms|
191d184
< forsake,Verb,|forsook|forsaken|forsaking|forsakes|
199a193
> give-attention,Verb,|give-attentioned|give-attentioned|give-attentioning|give-attentions|
203a198
> goodbye,Verb,|goodbied|goodbied|goodbying|goodbyes|
210d204
< hamstring,Verb,|hamstrung|hamstrung|hamstringing|hamstrings|
212a207
> harm,Verb,|harmed|harmed|harming|harms|
220d214
< hire,Verb,|hired|hired|hiring|hires|
283d276
< meditate,Verb,|meditated|meditated|meditating|meditates|
289d281
< mock,Verb,|mocked|mocked|mocking|mocks|
293d284
< multiply,Verb,|multiplied|multiplied|multiplying|multiplies|
316a308
> pity,Verb,|pitied|pitied|pitying|pities|
333a326
> press,Verb,|pressed|pressed|pressing|presses|
338d330
< prohesy,Verb,|prohesied|prohesied|prohesying|prohesies|
339a332
> prophesy,Verb,|prophesied|prophesied|prophesying|prophesies|
402d394
< scout,Verb,|scouted|scouted|scouting|scouts|
404d395
< seal,Verb,|sealed|sealed|sealing|seals|
406a398
> seek-attention,Verb,|seek-attentioned|seek-attentioned|seek-attentioning|seek-attentions|
413a406
> sex,Verb,|sexed|sexed|sexing|sexes|
422d414
< sigh,Verb,|sighed|sighed|sighing|sighs|
475d466
< terrify,Verb,|terrified|terrified|terrifying|terrifies|
483d473
< torture,Verb,|tortured|tortured|torturing|tortures|
486d475
< trample,Verb,|trampled|trampled|trampling|tramples|
499d487
< urge,Verb,|urged|urged|urging|urges|
503d490
< vomit,Verb,|vomited|vomited|vomiting|vomits|
523d509
< wither,Verb,|withered|withered|withering|withers|

Next steps

This process needs to be updated to include the Analyzer project as well: https://github.com/presciencelabs/tabitha-editor/tree/main/database/inflections#extracting-word-forms-from-tbta

craigp-atw commented 6 months ago

I think the solution would be to get Tod to update the Analyzer with the missing/incorrect forms, then solely use the inflections from the Analyzer project. No need to draw from the English as well.

longrunningprocess commented 6 months ago

I spoke with Tod and he made the point that there will always be the possibility of words in the Analyzer that are not in English, e.g., sex (slept with). It still seems to me we should build a merge option between the two.

longrunningprocess commented 6 months ago

And as far as the words in the English and not the Analyzer, Tod had this to say:

Many of the red ones will be moved into the Analyzer fairly soon, but not all of them. For example, 'hamstring' is complex and should never occur in our analysis. We always explicate 'hamstring' with 'X cuts the muscles that are in Y's legs'. Some of the other red ones are also complex and shouldn't occur in the analysis, so they don't need to be in the analyzer. However, I occasionally make a new Analyzer from the English project, so at that time the complex words like 'hamstring', etc. will be in the Analyzer even though they don't need to be. So eventually, yes, all of the red ones will be in the Analyzer even though they don't need to be.

Between the words not necessarily belonging in the Analyzer and the delay between getting them from English into Analyzer I think we should stick to a merge process between the two.

craigp-atw commented 5 months ago

'results' and 'sexing' were temporarily handled in #60 but leaving this open for the implementation of the merge process.

craigp-atw commented 3 months ago

Related to https://github.com/presciencelabs/tabitha-ontology/issues/35 , we plan to relocate the inflection database to the targets app and provide an API to access/lookup the inflections. Including both the Analyzer.mdb and English.mdb will allow the editor to query both of them through the API and perform a merge-on-the-fly between the two.