indexphonemica / pshrimp-server

IPHON app backend
0 stars 1 forks source link

New search query types #20

Open defseg opened 4 years ago

defseg commented 4 years ago

"More back vowels than front vowels" is currently impossible to translate into a Pshrimp query - there should be > and < operators.

(It probably wouldn't even add too much to the complexity of the parser to make these infix, so that should be done for usability even though infix notation was a mistake.)

defseg commented 4 years ago

If you state the claim in more general terms and not in terms of "exactly these phonetic values", you would probably be correct in saying that systems with more phonatory contrasts among the voiced stops than among the voiceless stops are rare (this implies at least three phonatory stop types).

(from here)

Not sure how this would be translated even with comparison operators. You'd have to specify a POA, and do something like (with extensive featural shorthand) +voiced;+velar > -voiced;+velar, but there could be gaps in the system or labiovelars or something.

Maybe some sort of indexing, so you could do +voiced;αplace > -voiced;αplace. (Where αplace unpacks to something like αlabial;αround;αcoronal;αanterior;αdorsal;αback - an UPSID-style featural model would be nice, even if it's only for shorthand and we have an underlying binary model.)

defseg commented 4 years ago

One problem with using the greater-than and less-than signs, of course, is that > is already used for allophonic rule search. Maybe this should be changed to -> or to.

defseg commented 4 years ago

Let's say you want to find languages with more back rounded vowels than front vowels... this is a little annoying to do because of diphthongs, so let's not say that.

Let's say you want to find languages with more tones than consonants. Let's also say we take 'consonants' to mean 'non-syllabic segments'. This is pretty easy to check by hand, because currently the only result should be iauu1242-1. Since the tone feature in the PHOIBLE model is AFAICT always either + or 0, the raw SQL is:

SELECT
  d.inventory_id
FROM
  doculects AS d
WHERE
  (SELECT
     COUNT(*)
   FROM
     segments
     JOIN doculect_segments ON doculect_segments.segment_id = segments.id
     JOIN doculects ON doculects.id = doculect_segments.doculect_id
   WHERE
     segments.tone = '+'
     AND doculects.id = d.id)
  > 
  (SELECT 
     COUNT(*)
   FROM 
     segments
     JOIN doculect_segments ON doculect_segments.segment_id = segments.id
     JOIN doculects ON doculects.id = doculect_segments.doculect_id
   WHERE
     segments.syllabic = '-'
     AND segments.tone IS NULL
     AND doculects.id = d.id)

(Do we want to store PHOIBLE-model 0 values as '0' rather than NULL?)

defseg commented 4 years ago

Yes, we want to store PHOIBLE-model 0 values as '0'. That way we don't have to worry about null handling. (0feature queries are currently unsupported; that'll have to change.)

Infix notation wouldn't be too hard to handle in the parser -- if it hits a bare featural qualificand, currently it looks ahead for an allophone query marker >, so it could instead look ahead for either > or (let's say) >> / <<, more/fewer than. Overloading the pac-brackets doesn't seem great, but enough bikeshedding about the tokens; if it's annoying, we can just change it later. So the above query is:

+tone >> -syllabic;0tone

(0feature queries currently aren't supported.)

But there's another problem: we need a way to say or inside a featural qualificand, to handle queries like "languages in which the majority of tones are in some way registral".