presciencelabs / tabitha-sources

0 stars 0 forks source link

Move data from Sample.Features_Source into the sources project #9

Open longrunningprocess opened 5 months ago

longrunningprocess commented 5 months ago

formed from the pairing session in https://github.com/presciencelabs/tabitha-targets/issues/4#issuecomment-2125655542

longrunningprocess commented 4 months ago
SELECT  SyntacticName as part_of_speech,
    FeatureName,
    FeatureValues

FROM    Features_Source
    INNER JOIN  SyntacticCategories
    ON      SyntacticCategory = SyntacticCategories.ID

WHERE FeatureName NOT LIKE "Spare%"

ORDER BY SyntacticCategory
part_of_speech FeatureName FeatureValues
Noun Number Singular/S|Dual/D|Trial/T|Quadrial/Q|Paucal/p|Plural/P\
Noun Participant Tracking First Mention/I|Routine/D|Integration/i|Exiting/E|Restaging/R|Offstage/O|Generic/G|Interrogative/Q|Frame Inferable/F|Unmarked/U\
Noun Polarity Affirmative/A|Negative/N\
Noun Proximity Not Applicable/n|Near Speaker and Listener/N|Near Speaker/S|Near Listener/L|Remote within Sight/R|Remote out of Sight/r|Temporally Near/T|Temporally Remote/t|Contextually Near with Focus/C|Contextually Near/c\
Noun Future Expansion Unspecified/K\
Noun Person First/1|Second/2|Third/3|First Inclusive/A|First Exclusive/B|First as Third/F|Second as Third/S|First Inclusive as Third/I|First Exclusive as Third/E\
Noun Surface Realization Noun/N|Always a Noun/A|PRO/p|Personal Pronoun/P|Reflexive Pronoun/R|Reciprocal Pronoun/r|Possessive Pronoun/a|Locative Pronoun/L|Relative Pronoun/D|Big Pro Plus/B|Conjoined Personal Pronoun/C\
Noun Participant Status Not Applicable/N|Protagonist/P|Antagonist/A|Major Participant/M|Minor Participant/m|Major Prop/p|Minor Prop/r|Significant Location/L|Insignificant Location/l|Significant Time/T|Emphasized/E\
Verb Time Past/Y|Future/Z|Present/P|Immediate Past/D|Earlier Today/A|Yesterday/a|2 Days Ago/b|3 Days Ago/c|A Week Ago/d|A Month Ago/e|A Year Ago/f|During Speaker's Lifetime/g|Historic Past/h|Eternity Past/i|Unknown Past/q|Discourse/r|Immediate Future/E|Later Today/F|Tomorrow/j|2 Days from Now/k|3 Days from Now/l|A Week from Now/m|A Month from Now/n|A Year from Now/o|During Speaker's Lifetime (future)/s|Unknown Future/p|Timeless/T\
Verb Aspect Inceptive/N|Completive/C|Cessative/c|Continuative/o|Imperfective/I|Routine/R|Habitual/H|Gnomic/G|Unmarked/U\
Verb Mood Indicative/I|Definite Potential/a|Probable Potential/b|'might' Potential/c|'must' Obligation/f|'should' Obligation/g|'may' (permissive)/l|'could' enablement/C\
Verb Reflexivity Not Applicable/N|Reciprocal/R|Reflexive/r\
Verb Polarity Affirmative/A|Negative/N|Emphatic Affirmative/E|Emphatic Negative/e\
Verb Adjective Degree No Degree/N|Comparative/C|Superlative/S|Intensified/I|Extremely Intensified/E|'too'/T|'less'/L|'least'/l\
Verb Target Tense & Form Unspecified/.|Past/P|Present/p|Future/F|"to"/t|"-ing"/i|Stem/N|"-en"/e\
Adjective Degree No Degree/N|Comparative/C|Superlative/S|Intensified/I|Extremely Intensified/E|'too'/T|'less'/L|'least'/l|Equality/q|Intensified Comparative/i|Intensified 'less'/c|Superlative of 2 items/s\
Adverb Degree No Degree/N|Comparative/C|Superlative/S|Intensified/V|Extremely Intensified/E|'too'/T|'less'/L|'least'/l\
Conjunction Implicit No/.|Yes/Y\
longrunningprocess commented 4 months ago
craigp-atw commented 4 months ago
  • Is it ok to exclude the "Spare" rows?

I'm not sure at this point, we may find that it impacts the order/position of the other features, in which case we will need it.

  • Should we also exclude Noun's "Future Expansion" row?

Same as above

  • When parsing out the values, should "Verb Target Tense & Form"'s "Unspecified/." be excluded as well?

'Unspecified' should never be excluded, it is a valid and meaningful value.

  • Is it ok to move this to the Bible source? I'm just curious if these can change per project? If so, these can't be moved... if they can, why did they end up in the English project?

Yeah, upon further reflection, some of these need to be tied to the English project. The Features themselves (eg. Noun Proximity) are the same across all projects, but a target project can add values to each feature. The best example of this is 'Target Tense & Form', as all values except 'Unspecified' are unique to the project.

Most of these values are common though, and should be included within the Sources db, as those values are used within the semantic representation. Any project-specific values would only be used within that project's grammar rules.

In addition, a target project may make use of the 'Spare' rows by renaming them a name and values. Again, these features will not appear in the semantic representation, but are only used within the grammar rules. See the following example from my Swahili project: image Note all the 'Original...' columns. I think those are included so that the user can 'reset' a feature to its original state. And I think we can achieve a similar effect by having the common features in a separate db (ie the Sources db) from the project-specific ones (ie in the Targets db). We can hash that out more though.

longrunningprocess commented 3 months ago

do #8 first

craigp-atw commented 3 months ago

Use Sample.mdb instead of English.mdb