rmlockwood / FLExTrans

Machine Translation using FLEx, Apertium, and STAMP
MIT License
10 stars 2 forks source link

Reference feature names in addition to values in transfer rules #243

Open somelinguist opened 2 years ago

somelinguist commented 2 years ago

Currently, it's only possible to reference feature values in transfer rules, and this is done by defining attributes with corresponding values.

Because only values are output, feature values need to be unique to work correctly, at least within a rule.

This makes it hard to work with multiple complex features like subject and object agreement that might have sub-features that use the same list of possible values like person and number.

For example, an irregularly inflecting verb that means "say" might have the features like [sbj:[num:sg][pers:3]][obj:[num:sg][pers:2]] in FLEx. With the right rule, a form with such features could be output as something like hit1.1 sg sg 3 2 in FLExTrans, which makes it impossible(?) to synthesize correctly.

If there were a way to refer to the feature name/path in addition to the value, it seems like it would be possible to write a rule that would correctly synthesize/match.

Some ideas:

These are just some ideas, which might be hard to implement or not worth implementing, especially if they caused incompatibilities with previous versions.

bbryson commented 1 year ago

I like these ideas. It seems like we need "namespaces" for features, and this would be a way of providing it. This is nicer than requiring that the features for Sbj and Obj have different names.

mr-martian commented 5 months ago

Apertium has discussed reserving : in tags for other uses, but . should be fine.

mr-martian commented 5 months ago

Correction: There's two different senses of "should be fine": <sbj.num> will be processed in the input stream without issue, but if someone writes tags="sbj.num", that will be processed as referring to <sbj><num>. Writing tags="sbj\.num" would probably fix that, but it's probably better to find a different separator. Maybe sbj|num could work?