Move data from Sample.Features_Source into the sources project

longrunningprocess commented 5 months ago

formed from the pairing session in https://github.com/presciencelabs/tabitha-targets/issues/4#issuecomment-2125655542

longrunningprocess commented 4 months ago

SELECT  SyntacticName as part_of_speech,
    FeatureName,
    FeatureValues

FROM    Features_Source
    INNER JOIN  SyntacticCategories
    ON      SyntacticCategory = SyntacticCategories.ID

WHERE FeatureName NOT LIKE "Spare%"

ORDER BY SyntacticCategory

part_of_speech	FeatureName	FeatureValues
Noun	Number	Singular/S\|Dual/D\|Trial/T\|Quadrial/Q\|Paucal/p\|Plural/P\
Noun	Participant Tracking	First Mention/I\|Routine/D\|Integration/i\|Exiting/E\|Restaging/R\|Offstage/O\|Generic/G\|Interrogative/Q\|Frame Inferable/F\|Unmarked/U\
Noun	Polarity	Affirmative/A\|Negative/N\
Noun	Proximity	Not Applicable/n\|Near Speaker and Listener/N\|Near Speaker/S\|Near Listener/L\|Remote within Sight/R\|Remote out of Sight/r\|Temporally Near/T\|Temporally Remote/t\|Contextually Near with Focus/C\|Contextually Near/c\
Noun	Future Expansion	Unspecified/K\
Noun	Person	First/1\|Second/2\|Third/3\|First Inclusive/A\|First Exclusive/B\|First as Third/F\|Second as Third/S\|First Inclusive as Third/I\|First Exclusive as Third/E\
Noun	Surface Realization	Noun/N\|Always a Noun/A\|PRO/p\|Personal Pronoun/P\|Reflexive Pronoun/R\|Reciprocal Pronoun/r\|Possessive Pronoun/a\|Locative Pronoun/L\|Relative Pronoun/D\|Big Pro Plus/B\|Conjoined Personal Pronoun/C\
Noun	Participant Status	Not Applicable/N\|Protagonist/P\|Antagonist/A\|Major Participant/M\|Minor Participant/m\|Major Prop/p\|Minor Prop/r\|Significant Location/L\|Insignificant Location/l\|Significant Time/T\|Emphasized/E\
Verb	Time	Past/Y\|Future/Z\|Present/P\|Immediate Past/D\|Earlier Today/A\|Yesterday/a\|2 Days Ago/b\|3 Days Ago/c\|A Week Ago/d\|A Month Ago/e\|A Year Ago/f\|During Speaker's Lifetime/g\|Historic Past/h\|Eternity Past/i\|Unknown Past/q\|Discourse/r\|Immediate Future/E\|Later Today/F\|Tomorrow/j\|2 Days from Now/k\|3 Days from Now/l\|A Week from Now/m\|A Month from Now/n\|A Year from Now/o\|During Speaker's Lifetime (future)/s\|Unknown Future/p\|Timeless/T\
Verb	Aspect	Inceptive/N\|Completive/C\|Cessative/c\|Continuative/o\|Imperfective/I\|Routine/R\|Habitual/H\|Gnomic/G\|Unmarked/U\
Verb	Mood	Indicative/I\|Definite Potential/a\|Probable Potential/b\|'might' Potential/c\|'must' Obligation/f\|'should' Obligation/g\|'may' (permissive)/l\|'could' enablement/C\
Verb	Reflexivity	Not Applicable/N\|Reciprocal/R\|Reflexive/r\
Verb	Polarity	Affirmative/A\|Negative/N\|Emphatic Affirmative/E\|Emphatic Negative/e\
Verb	Adjective Degree	No Degree/N\|Comparative/C\|Superlative/S\|Intensified/I\|Extremely Intensified/E\|'too'/T\|'less'/L\|'least'/l\
Verb	Target Tense & Form	Unspecified/.\|Past/P\|Present/p\|Future/F\|"to"/t\|"-ing"/i\|Stem/N\|"-en"/e\
Adjective	Degree	No Degree/N\|Comparative/C\|Superlative/S\|Intensified/I\|Extremely Intensified/E\|'too'/T\|'less'/L\|'least'/l\|Equality/q\|Intensified Comparative/i\|Intensified 'less'/c\|Superlative of 2 items/s\
Adverb	Degree	No Degree/N\|Comparative/C\|Superlative/S\|Intensified/V\|Extremely Intensified/E\|'too'/T\|'less'/L\|'least'/l\
Conjunction	Implicit	No/.\|Yes/Y\

longrunningprocess commented 4 months ago

Is it ok to exclude the "Spare" rows?
Should we also exclude Noun's "Future Expansion" row?
When parsing out the values, should "Verb Target Tense & Form"'s "Unspecified/. be excluded as well?
Is it ok to move this to the Bible source? I'm just curious if these can change per project? If so, these can't be moved... if they can, why did they end up in the English project?

craigp-atw commented 4 months ago

Is it ok to exclude the "Spare" rows?

I'm not sure at this point, we may find that it impacts the order/position of the other features, in which case we will need it.

Should we also exclude Noun's "Future Expansion" row?

Same as above

When parsing out the values, should "Verb Target Tense & Form"'s "Unspecified/." be excluded as well?

'Unspecified' should never be excluded, it is a valid and meaningful value.

Is it ok to move this to the Bible source? I'm just curious if these can change per project? If so, these can't be moved... if they can, why did they end up in the English project?

Yeah, upon further reflection, some of these need to be tied to the English project. The Features themselves (eg. Noun Proximity) are the same across all projects, but a target project can add values to each feature. The best example of this is 'Target Tense & Form', as all values except 'Unspecified' are unique to the project.

Most of these values are common though, and should be included within the Sources db, as those values are used within the semantic representation. Any project-specific values would only be used within that project's grammar rules.

In addition, a target project may make use of the 'Spare' rows by renaming them a name and values. Again, these features will not appear in the semantic representation, but are only used within the grammar rules. See the following example from my Swahili project: Note all the 'Original...' columns. I think those are included so that the user can 'reset' a feature to its original state. And I think we can achieve a similar effect by having the common features in a separate db (ie the Sources db) from the project-specific ones (ie in the Targets db). We can hash that out more though.

longrunningprocess commented 3 months ago

do #8 first

craigp-atw commented 3 months ago

Use Sample.mdb instead of English.mdb

presciencelabs / tabitha-sources

Move data from Sample.Features_Source into the sources project #9