surfacesyntacticud / guidelines

Guidelines for Surface Syntactic Universal Dependencies
https://guidelines.surfacesyntacticud.org/
0 stars 0 forks source link

Reannotating *WH est-ce que* #42

Open Valentin-D-Richard opened 1 year ago

Valentin-D-Richard commented 1 year ago

Following issue surfacesyntacticud/guidelines#35 I would like to discuss the special case of WH est-ce que.

The questions are: A. Is WH est-ce que (resp. WH est-ce qui) an idiom? B. Where do we attach WH: on est-ce que or on the corresponding verb of the remaining clause?

Linguistic studies

In Orféo guidelines, all WH est-ce que are considered as single words. They are supposed to be single tokens. But in practice, token segmentation mistakes like combien + est-ce que are not always linked with a morph relation as expected, but as a separate dependent.

The Grande Grammaire du Français (GGF) distinguishes two cases:

  1. They consider qu'est-ce que as a complex (or "agglomerated") word, being a pronoun
  2. For any other WH, they consider WH as separate from the subordinating complex word est-ce que

Especially, some words can be inserted in between WH(≠que) and est-ce que. This is not possible with qu', which is a verbal clitic. Moreover, complex PP can be fronted preceding est-ce que.

(1.) Combien, finalement, est-ce que vous pouvez dépenser ? (2.) [À qui] est-ce que tu parles ?

The analysis of qu'est-ce que as fixed traces back to Obernauer [1]. We observe now that it is frequently used in subordinated interrogatives in colloquial spoken French, whereas other WH est-ce que are rarer embedded.

(3.) alors du coup, j'ai réfléchi sur euh, qu'est-ce que je pourrais te raconter. (ParisStories)

We also observe emergent uses of qu'est-ce que as an exclamative adverb [2] [GGF, §IX-10.4.3 p.1119].

(4.) Mon dieu, qu'est-ce que c'est long !

Regarding language acquisition, Zuckerman (2001, chap 5) observes that (reporting the results of Hulk [3])

Wh+ESK structures appear at the same time as other CP constructions, such as clefts [...] It appears, however, that questions with qu’est-ce que differ from the other Wh+ESK questions in this respect: children seem to begin producing qu’est-ce que questions and fronted Wh- questions at the same stage. [...] Hulk (1996) therefore proposes analysing qu’est-ce que not as the Wh-word + ESK, but as an unanalysed chunk that behaves like other simplex Wh-word

Farmer's sociopragmatic corpus study on French movies [4] sheds a light on the gap between their use. She claims that:

the interrogative word que so often occurs with est-ce que across speaker, class and sex—in every style and in every decade—that it appears to be lexicalized.

All these diverse arguments are in favour of a differential analysis of WH est-ce que depending of WH:


Question B. remains to be answers for WH(≠que). As UD is a "deeper" annotation framework, I suppose that WH should be attached to the corresponding verb, like dépenser in (1.). In SUD, I assume that an analysis in parallel to qu'est-ce que would be preferred, that is: attaching WH to est-ce que. If so, would the SUD2UD converter be able to manage such a transformation?


[1] H.-G. Obenauer, Etudes de syntaxe interrogative du français: Quoi, combien et le complémenteur. Max Niemeyer Verlag, 1976. doi: 10.1515/9783111340364. [2] L. Dekhissi, “Qu’est-ce t’as été te mêler de ça ?! Une « nouvelle » structure pour les questions rhétoriques conflictuelles,” Journal of French Language Studies, vol. 26, no. 3, pp. 279–298, 2016, doi: 10.1017/S0959269515000253. [3] A. Hulk, “The syntax of WH-questions in Child French,” undefined, 1996 [4] K. L. Farmer, “‘De quoi tu parles?’: A diachronic study of sociopragmatic interrogative variation in French films,” University of Pennsylvania Working Papers in Linguistics, vol. 19, no. 2, p. 8, 2013, doi: https://repository.upenn.edu/pwpl/vol19/iss2/8.

perrier54 commented 1 year ago

I agree with @Valentin-D-Richard's conclusion. See surfacesyntacticud/guidelines#35

Valentin-D-Richard commented 1 year ago

Some months ago, we discussed this issue and issue #35. We agreed on most annotations but we didn't close the issue because of sentences like 4. I would like first to try to recall what we agree on, and second to add novel arguments to try to set up the tricky case of sentence 4. I will only mention SUD, but I bear in mind that it has to give a reasonable conversion in UD. (ECQ stands for est-ce que)

  1. Est-ce que Marie dort. ECQ + S
  2. Où est-ce que Marie dort ? WH + ECQ + S, WH != QUE
  3. Qu'est-ce que Marie fait ? QUE + ECQ + S
  4. Qu'est-ce que Angiox ? QUE + ECQ + NP

Agreement

We agreed that (Correct me if I wrongly remember some details):

  1. est-ce que is an idiom, and thus should bear the feature (In)Idiom

    a. est-ce que should be syntactically decomposed as much as possible a.i. its head is est a.ii. -ce is the subject of est a.iii. que is the predicative complement of est a.iv. que has upos=SCONJ

    b. The head of ECQ + S is ECQ (negative distribution with deletion criterion) b.i S depends on que (see issue #39) b.ii. ECQ acts like a complementizer, and should thus bear ExtPos=SCONJ b.iii que has upos=SCONJ even if WH = qui

    c. est-ce que should be analyzed the same way in cases 1. and 2. c.i. like for relative clauses, the extracted WH should depend on an element in S (e.g. modifier of dort for 2.)

    The place of contention concerns case with WH = que. Sentence 3. seems to behave regularly with respect to other WH + ECQ sentences. This would suggest the following choices:

    d. est-ce que should be analysed the same way in cases 2. and 3. d.i. the extracted Que should depend on an element in S (e.g. comp:obj of fait in 3.)

Limit case

The problem arises if we want to annotate 3. the same way as 4. I argue that we should not do so. The argument is that this occurrence of qu'est-ce que does not have the same distribution as occurrences of qu'est-ce que like in 3. I advocate for a "lexical" (or idiomatic) syntactic ambiguity. Observe that some occurrences of qu'est-ce que (or apocoped versions) can commute with pourquoi: 4. and 5., 6. and 7. Sentences 6. and 7. also have the same meaning, they only differ in their pragmatics

  1. Qu'est-ce que Angiox ? Que + ECQ + NP
  2. Pourquoi Angiox ?
  3. Qu’est-ce t’as été te mêler de ça ?! Que + ECQ + S' [see Dekhissi 2016]
  4. Pourquoi t’as été te mêler de ça ?!

My analysis is the following: in 4. and 6., the first que does not have any thematic role. In particular, it is neither a complement nor a modifier of the NP or the verb in S' (e.g. we can't replace it by another interrogative modifier: Pourquoi est-ce que Angiox.). Therefore, the first que does not syntactically depend on an element in the NP or S'. Moreover, there are many evidences in favor of analyzing qu'est-ce que as lexicalized, contrary to other WH + est-ce que (see first message of this issue). Therefore, I advocate to include the first que in the idiom (i.e. idiom qu'est-ce que), and let it depend on its head est by default, with an unk relation. This analysis follows the treatment of other non-contiguous (from a historical perspective) fixed MWEs, like n'importe quel (unk(importe,quel). Note that this would create a case of idiom which head is not the first word.

If we do so, one question remains: what is the pos of this idiom. In exclamatives, qu'est-ce que commutes with que and ce que: compare 8., 9. and 10. We could think of ExtPos=ADV, based on the GGF analysis. But this would mean that this expression depend on NP/S'. Following SUD approach, maybe putting qu'est-ce que as head with ExtPos=SCONJ could make sense.

  1. Qu’est-ce que ça peut couter cher ! QUE + ECQ + S' [see GGF IX-10.4.3]
  2. Que ça peut coûter cher !
  3. Ce que ça peut coûter cher !

To sum it up, I suggest these guidelines:

e. When the WH word in qu'est-ce que + NP/S' is not the complement or modifier of an element in NP or S', then qu'est-ce que is an idiom e.i. the first que is attached (by default) to est with relation unk

The upos of this qu'est-ce que has to be discussed.

References:

Laurie Dekhissi. 2016. Qu’est-ce t’as été te mêler de ça ?! Une « nouvelle » structure pour les questions rhétoriques conflictuelles. Journal of French Language Studies, 26(3):279–298. Citation Key: Dekhissi:2016.

sylvainkahane commented 8 months ago

@Valentin-D-Richard qu'est ce que S is in a paradigm with qu'est-ce qui, qui est-ce que and qui est-ce qui. Do you want to treat only qu'est-ce que as an idiom? How do you justify to treat or not to treat it differently?

sylvainkahane commented 8 months ago

About the analysis of WH est-ce que, when we propose a compositional analysis. Example: Quand est-ce que tu pars ?

SUD analysis: quand <-[comp:pred]- est(-ce) -[comp:cleft]-> que (tu pars), with est-ce que annotated as an idiom with ExtPos=SCONJ.

We need to add an enhanced dependency in cleft sentences: quand <-[E:mod]- pars.

With the enhanced dep, we can convert our SUD analysis in:

UD analysis: quand <-[mod]- pars, [est-ce que] <-[mark]- pars.

@bguil We need to replace the comp:pred relation by the E:mod relation in the SUD=>UD conversion.

I propose to add an enhanced dependency in all cleft sentences, interrogation WH est-ce que, as well as other cleft sentences.

sylvainkahane commented 8 months ago

Analysis of qu'est-ce que

1) qu'est ce que tu veux ? : SUD analysis: same analysis than for compositional WH est-ce que, but the idiom include the interrogative pronoun que, that is: qu' <-[comp:pred]- est(-ce) -[comp:cleft]-> que (tu pars), with qu'est-ce que annotated as an idiom with ExtPos=PRON and qu' <-[E:comp:obj]- veux.

UD analysis: [qu'est-ce que] <-[obj]- veux

2) qu'est-ce que Angiox ? :
UD analysis: Do we want the same analysis as pourquoi Angiox ?, pourquoi lui ?, or even (on se retrouve à Paris) où à Paris ? If yes, qu'est-ce que is still PRON. But the the relation remains unclear for me. Even the direction of the dependency. Other analysis: qu'est-ce que is PART and depends on Angiox by a relation mark.

SUD analysis: qu' <-[comp:pred]- est(-ce) -[comp:cleft]-> que (Angiox), with with qu'est-ce que annotated as an idiom with ExtPos=PRON or PART and qu' <-[E:???]- Angiox.

Valentin-D-Richard commented 8 months ago

@sylvainkahane I find your suggestions really good. Using an enhanced dependency to mediate the SUD -> UD conversion for clefts solves most problems. Even if I believe that WH est-ce que are not traditional clefts, it suits me to annotate them like clefts because of their surface resemblance.

I agree with your propositions regarding the UD and SUD annotations of Quand est-ce que tu pars ? and Qu'est-ce que tu veux ?. I agree with your SUD annotation for Qu'est-ce que Angiox ?. I believe that in Pourquoi Angiox ? and Où à Paris ?, pourquoi and should depend on Angiox and à Paris respectively, with the SUD mod relation. Similarly, in the SUD for Qu'est-ce que Angiox ?, I am in favor of keeping ExtPos=PRON for qu'est-ce que and having qu' <-[E:mod]- Angiox.

I think that both qu'est-ce que and qu'est-ce qui should be treated as idioms, and that both the second que and qui should have pos SCONJ. Regarding qui est-ce que/i, the GGF claims that it is also an idiom. To be honest, I don't have strong arguments in favor of distinguishing qu'est-ce que/i from qui est-ce que/i, except the higher frequency of the first. I let you choose whether to also treat qui est-ce que/i as an idiom or not.

Thank you

bguil commented 8 months ago

There is currently no use of "enhanced" dependencies in SUD. We can imagine extending the SUD annotation to "enhanced SUD", but I don't think it makes sense to introduce these enhanced dependencies just for the examples above.

In the meantime, we can add lexicalized French specific rules to deal with these exceptions.

sylvainkahane commented 8 months ago

To have a complete analysis of cleft sentences, we really need to add a dependency to the tree structure. Example: c'est à lui que je parle. The comp:obl relation between parle and à cannot be in the tree. I consider that this relation is part of the syntactic structure (and that the syntactic structure cannot be satisfyingly be encoded by a tree). This dependency has a particular status and must be differentiated from pure syntactic dependencies. It seems that the "enhanced" level is the best way to encode that, no?

These additional dependencies cannot be deduced automatically:

@bguil Do you think of other enhanced dependency that would be useful?

About subject and object clefts, they are currently annotated with qui/que as a relative pronoun, while other clefts are annotated with que as a SCONJ:

This analysis is very questionnable (see Kayne 1975, Kahane 2002), but it was a way to distinguish subject and object clefts. If we add the additional (enhanced?) dependency, we can easily convert them and have SCONJ in all clefts (and therefore an homogeneous analysis of clefts).

bguil commented 8 months ago

I'm convinced (for at least ten years) that a tree structure may not be sufficient to encode syntax, and that it may be useful to consider a graph structure!

However, it is not clear where to draw the line: in Enhanced Universal Dependencies, many types of "additional edges" are considered (shared arguments of coordination, subject of controlled verbs, links of relative pronouns with its antecedent…). Do we want to take into account (some of) these relations?

For the sake clarity, I also think that it is not a good choice, in SUD, to call these additional relations enhanced if they do not correspond to what is called enhanced in the UD framework. We can call them extended, or deep (but this last term is already overused …).

sylvainkahane commented 8 months ago

I am absolutely ok to give a more precise status (than enhanced) to every additional "dependency" we plan to consider. Nevertheless I suppose that the simplest solution to encode such dependencies is to use the same way as enhanced dependencies, no?

About the status of such dependencies, I consider them to be part of the surface-syntactic structure, especially because we need them to impose the régime. But they are hidden for the linearization module, which only need the tree structure to compute the text or spoken chain. Conversely they are visible for the syntax-semantics interface, which give them a sort of deep syntactic status. I don't have a clear name to propose. In my publications of the 2000s, I called them pseudo-dependencies, but it was related to their status in the formalization (they had a different polarity from true dependencies). In (post)generative model, they consider filler-gap dependencies. In our case we don't consider the gap and link directly the governor of the supposed gap to the filler. We can yet keep the term filler and call them filler dependencies.