Open blegaut opened 2 months ago
So, I'm not surprised there are FR changes over time. We created a "combined" FR model to be the default out of four mostly compatible treebanks.
There is exactly one line with Assurez-vous
in it (zero with assurez-vous
) and the dependency is actually neither obj
nor nsubj
:
# text = Assurez-vous de boire suffisamment (au moins un à deux verres) avant et après le traitement par Aclasta, selon les instructions de votre médecin : ceci afin d'éviter une déshydratation.
1 Assurez assurer VERB _ Mood=Imp|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin 0 root _ SpaceAfter=No
2 -vous vous PRON _ Number=Plur|Person=2|PronType=Prs|Reflex=Yes 1 expl:pv _ _
Does this dependency look reasonable to you? At any rate, I can rebuild the FR models with the latest versions of the datasets, and perhaps it will improve performance somewhat.
Thanks for your quick reply.
Yes expl:pv
is definitively the best option here. I hope that it works when you rebuild the FR models. Please let me know how and when I can test it.
Thanks,
Bernard
Mmm, unfortunately, the models continue to call it nsubj
after rebuilding with the latest versions of the git data. That's also true for the version using a transformer. One option here is to throw together a couple sentences which cover the dependency and add that to the training data. I don't know any French, so I don't think I should be the one to do it, but if you have suggested dependencies for a couple sentences, that would likely be enough.
(We could also start with parses for a couple sentences with that pair of words and correct the errors that show up.)
Hello, I am happy to contribute by providing a couple of corrected sentences. What would be the expected format and the proper repository ?
I also noticed some other regressions after the rebuilding with the latest versions of the git data. Is there any way to access the previous versions ?
Thanks
Is there any way to access the previous versions ?
Well..... yes, that's technically possible. They should be in the HuggingFace history for the FR models. Although the idea behind making the newer models is there will be other things that work better with the updated data
https://huggingface.co/stanfordnlp/stanza-fr
If you can come up with some example regression sentences, perhaps the best format would just be text sentences (cut down so they demonstrate the error but aren't 50 words long), I'll run them through our best models, and you can let me know where you spot the errors
Here are some example regression sentences:
Nous vous recommandons vivement d'investir dans un système aux normes.
the root should be the verb recommandons rather than the subject Nous
Élaborez un plan de gestion de crise.
the root should be the verb Élaborez rather than plan
Il semble que vous ne soyez pas informé.
almost every dependency relationships are wrong
Mettez en place des politiques de recouvrement plus strictes!
the root should be the verb Mettez rather than place
Nos experts peuvent vous conseiller.
experts should be the subject
Thanks,
Bernard
If I put some of these into the "accurate" models with a Transformer, it already does some of these recommendations. I can post some here:
# recommandons is the verb
# text = Nous vous recommandons vivement d'investir dans un système aux normes.
# sent_id = 0
1 Nous nous PRON _ Emph=No|Number=Plur|Person=1|PronType=Prs 3 nsubj _ start_char=0|end_char=4|ner=O
2 vous vous PRON _ Emph=No|Number=Plur|Person=2|PronType=Prs 3 iobj _ start_char=5|end_char=9|ner=O
3 recommandons recommander VERB _ Mood=Ind|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin 0 root _ start_char=10|end_char=22|ner=O
4 vivement vivement ADV _ _ 3 advmod _ start_char=23|end_char=31|ner=O
5 d' de ADP _ _ 6 mark _ start_char=32|end_char=34|ner=O|SpaceAfter=No
6 investir investir VERB _ VerbForm=Inf 3 xcomp _ start_char=34|end_char=42|ner=O
7 dans dans ADP _ _ 9 case _ start_char=43|end_char=47|ner=O
8 un un DET _ Definite=Ind|Gender=Masc|Number=Sing|PronType=Art 9 det _ start_char=48|end_char=50|ner=O
9 système système NOUN _ Gender=Masc|Number=Sing 6 obl:arg _ start_char=51|end_char=58|ner=O
10-11 aux _ _ _ _ _ _ _ start_char=59|end_char=62|ner=O
10 à à ADP _ _ 12 case _ _
11 les le DET _ Definite=Def|Number=Plur|PronType=Art 12 det _ _
12 normes norme NOUN _ Gender=Fem|Number=Plur 9 nmod _ start_char=63|end_char=69|ner=O|SpaceAfter=No
13 . . PUNCT _ _ 3 punct _ start_char=69|end_char=70|ner=O|SpaceAfter=No
# Élaborez is the verb
# text = Élaborez un plan de gestion de crise.
# sent_id = 0
1 Élaborez élaborer VERB _ Mood=Imp|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin 0 root _ start_char=0|end_char=8|ner=O
2 un un DET _ Definite=Ind|Gender=Masc|Number=Sing|PronType=Art 3 det _ start_char=9|end_char=11|ner=O
3 plan plan NOUN _ Gender=Masc|Number=Sing 1 obj _ start_char=12|end_char=16|ner=O
4 de de ADP _ _ 5 case _ start_char=17|end_char=19|ner=O
5 gestion gestion NOUN _ Gender=Fem|Number=Sing 3 nmod _ start_char=20|end_char=27|ner=O
6 de de ADP _ _ 7 case _ start_char=28|end_char=30|ner=O
7 crise crise NOUN _ Gender=Fem|Number=Sing 5 nmod _ start_char=31|end_char=36|ner=O|SpaceAfter=No
8 . . PUNCT _ _ 1 punct _ start_char=36|end_char=37|ner=O|SpaceAfter=No
# would you check this?
# text = Il semble que vous ne soyez pas informé.
# sent_id = 0
1 Il lui PRON _ Emph=No|Gender=Masc|Number=Sing|Person=3|PronType=Prs 2 expl:subj _ start_char=0|end_char=2|ner=O
2 semble sembler VERB _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 0 root _ start_char=3|end_char=9|ner=O
3 que que SCONJ _ _ 8 mark _ start_char=10|end_char=13|ner=O
4 vous vous PRON _ Emph=No|Number=Plur|Person=2|PronType=Prs 8 nsubj:pass _ start_char=14|end_char=18|ner=O
5 ne ne ADV _ Polarity=Neg 8 advmod _ start_char=19|end_char=21|ner=O
6 soyez être AUX _ Mood=Ind|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin 8 aux:pass _ start_char=22|end_char=27|ner=O
7 pas pas ADV _ Polarity=Neg 8 advmod _ start_char=28|end_char=31|ner=O
8 informé informer VERB _ Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass 2 csubj _ start_char=32|end_char=39|ner=O|SpaceAfter=No
9 . . PUNCT _ _ 2 punct _ start_char=39|end_char=40|ner=O|SpaceAfter=No
# Mettez is the verb
# text = Mettez en place des politiques de recouvrement plus strictes!
# sent_id = 0
1 Mettez mettre VERB _ Mood=Imp|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin 0 root _ start_char=0|end_char=6|ner=S-LOC
2 en en ADP _ _ 3 case _ start_char=7|end_char=9|ner=O
3 place place NOUN _ Gender=Fem|Number=Sing 1 obl:mod _ start_char=10|end_char=15|ner=O
4-5 des _ _ _ _ _ _ _ start_char=16|end_char=19|ner=O
4 de de ADP _ _ 6 case _ _
5 les le DET _ Definite=Def|Number=Plur|PronType=Art 6 det _ _
6 politiques politique NOUN _ Gender=Fem|Number=Plur 1 obl:arg _ start_char=20|end_char=30|ner=O
7 de de ADP _ _ 8 case _ start_char=31|end_char=33|ner=O
8 recouvrement recouvrement NOUN _ Gender=Masc|Number=Sing 6 nmod _ start_char=34|end_char=46|ner=O
9 plus plus ADV _ _ 10 advmod _ start_char=47|end_char=51|ner=O
10 strictes strict ADJ _ Gender=Fem|Number=Plur 6 amod _ start_char=52|end_char=60|ner=O|SpaceAfter=No
11 ! ! PUNCT _ _ 1 punct _ start_char=60|end_char=61|ner=O|SpaceAfter=No
# experts is the subject
# text = Nos experts peuvent vous conseiller.
# sent_id = 0
1 Nos son DET _ Number=Plur|Number[psor]=Plur|Person[psor]=1|Poss=Yes|PronType=Prs 2 det _ start_char=0|end_char=3|ner=S-LOC
2 experts expert NOUN _ Gender=Masc|Number=Plur 3 nsubj _ start_char=4|end_char=11|ner=O
3 peuvent pouvoir VERB _ Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 0 root _ start_char=12|end_char=19|ner=O
4 vous vous PRON _ Emph=No|Number=Plur|Person=2|PronType=Prs 5 obj _ start_char=20|end_char=24|ner=O
5 conseiller conseiller VERB _ VerbForm=Inf 3 xcomp _ start_char=25|end_char=35|ner=O|SpaceAfter=No
6 . . PUNCT _ _ 3 punct _ start_char=35|end_char=36|ner=O|SpaceAfter=No
Everything looks good ! Thank you
This is what it came up with for ...
# text = Assurez-vous d'être à l'heure !
# sent_id = 0
1 Assurez assurer VERB _ Mood=Imp|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin 0 root _ start_char=0|end_char=7|ner=O|SpaceAfter=No
2 -vous vous PRON _ Emph=No|Number=Plur|Person=2|PronType=Prs 1 nsubj _ start_char=7|end_char=12|ner=O
3 d' de ADP _ _ 4 mark _ start_char=13|end_char=15|ner=O|SpaceAfter=No
4 être être AUX _ VerbForm=Inf 1 ccomp _ start_char=15|end_char=19|ner=O
5 à à ADP _ _ 7 case _ start_char=20|end_char=21|ner=O
6 l' le DET _ Definite=Def|Number=Sing|PronType=Art 7 det _ start_char=22|end_char=24|ner=O|SpaceAfter=No
7 heure heure NOUN _ Gender=Fem|Number=Sing 4 obl:arg _ start_char=24|end_char=29|ner=O
8 ! ! PUNCT _ _ 1 punct _ start_char=30|end_char=31|ner=O|SpaceAfter=No
but you were saying the expl:pv
dep is better?
Can you suggest one or two other sentences with Assurez-vous
or assurez-vous
in them?
yes, sure. Here are a few sentences:
# sent_id = 0
1 Puisque puisque SCONJ _ _ 4 mark _ start_char=0|end_char=7|ner=O
2 vous vous PRON _ Number=Plur|Person=2|PronType=Prs 4 nsubj:pass _ start_char=8|end_char=12|ner=O
3 êtes être AUX _ Mood=Ind|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin 4 aux:pass _ start_char=13|end_char=17|ner=O
4 équipé équiper VERB _ Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass 11 advcl _ start_char=18|end_char=24|ner=O
5 d' de ADP _ _ 7 case _ start_char=25|end_char=27|ner=O|SpaceAfter=No
6 un un DET _ Definite=Ind|Gender=Masc|Number=Sing|PronType=Art 7 det _ start_char=27|end_char=29|ner=O
7 logiciel logiciel NOUN _ Gender=Masc|Number=Sing 4 obl:arg _ start_char=30|end_char=38|ner=O
8 de de ADP _ _ 9 case _ start_char=39|end_char=41|ner=O
9 facturation facturation NOUN _ Gender=Fem|Number=Sing 7 nmod _ start_char=42|end_char=53|ner=O|SpaceAfter=No
10 , , PUNCT _ _ 4 punct _ start_char=53|end_char=54|ner=O
11 assurez assurer VERB _ Mood=Imp|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin 0 root _ start_char=55|end_char=62|ner=O|SpaceAfter=No
12 -vous vous PRON _ Emph=No|Number=Plur|Person=2|PronType=Prs 11 nsubj _ start_char=62|end_char=67|ner=O
13 d' de ADP _ _ 14 mark _ start_char=68|end_char=70|ner=O|SpaceAfter=No
14 utiliser utiliser VERB _ VerbForm=Inf 11 ccomp _ start_char=70|end_char=78|ner=O
15 le le DET _ Definite=Def|Gender=Masc|Number=Sing|PronType=Art 16 det _ start_char=79|end_char=81|ner=O
16 système système NOUN _ Gender=Masc|Number=Sing 14 obj _ start_char=82|end_char=89|ner=O
17 de de ADP _ _ 18 case _ start_char=90|end_char=92|ner=O
18 relance relance NOUN _ Gender=Fem|Number=Sing 16 nmod _ start_char=93|end_char=100|ner=O
19 afin afin ADV _ _ 14 advmod _ start_char=101|end_char=105|ner=O
20 de de ADP _ _ 21 mark _ start_char=106|end_char=108|ner=O
21 résorber résorber VERB _ VerbForm=Inf 19 ccomp _ start_char=109|end_char=117|ner=O
22 les le DET _ Definite=Def|Number=Plur|PronType=Art 23 det _ start_char=118|end_char=121|ner=O
23 retards retard NOUN _ Gender=Masc|Number=Plur 21 obj _ start_char=122|end_char=129|ner=O
24 de de ADP _ _ 25 case _ start_char=130|end_char=132|ner=O
25 paiement paiement NOUN _ Gender=Masc|Number=Sing 23 nmod _ start_char=133|end_char=141|ner=O
26 que que PRON _ PronType=Rel 28 obj _ start_char=142|end_char=145|ner=O
27 vous vous PRON _ Emph=No|Number=Plur|Person=2|PronType=Prs 28 nsubj _ start_char=146|end_char=150|ner=O
28 déplorez déplorer VERB _ Mood=Ind|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin 23 acl:relcl _ start_char=151|end_char=159|ner=O|SpaceAfter=No
29 . . PUNCT _ _ 11 punct _ start_char=159|end_char=160|ner=O|SpaceAfter=No
# text = Assurez-vous de bien suivre la réglementation qui encadre votre secteur d'activité
# sent_id = 0
1 Assurez assurer VERB _ Mood=Imp|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin 0 root _ start_char=0|end_char=7|ner=O|SpaceAfter=No
2 -vous vous PRON _ Number=Plur|Person=2|PronType=Prs 1 nsubj _ start_char=7|end_char=12|ner=O
3 de de ADP _ _ 5 mark _ start_char=13|end_char=15|ner=O
4 bien bien ADV _ _ 5 advmod _ start_char=16|end_char=20|ner=O
5 suivre suivre VERB _ VerbForm=Inf 1 xcomp _ start_char=21|end_char=27|ner=O
6 la le DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art 7 det _ start_char=28|end_char=30|ner=O
7 réglementation réglementation NOUN _ Gender=Fem|Number=Sing 5 obj _ start_char=31|end_char=45|ner=O
8 qui qui PRON _ PronType=Rel 9 nsubj _ start_char=46|end_char=49|ner=O
9 encadre encadrer VERB _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 7 acl:relcl _ start_char=50|end_char=57|ner=O
10 votre son DET _ Number=Sing|Poss=Yes 11 det _ start_char=58|end_char=63|ner=O
11 secteur secteur NOUN _ Gender=Masc|Number=Sing 9 obj _ start_char=64|end_char=71|ner=O
12 d' de ADP _ _ 13 case _ start_char=72|end_char=74|ner=O|SpaceAfter=No
13 activité activité NOUN _ Gender=Fem|Number=Sing 11 nmod _ start_char=74|end_char=82|ner=O|SpaceAfter=No
# text = Assurez-vous de couvrir les risques potentiels, y compris les incendies, les catastrophes naturelles et le vol.
# sent_id = 0
1 Assurez assurer VERB _ Mood=Imp|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin 0 root _ start_char=0|end_char=7|ner=O|SpaceAfter=No
2 -vous vous PRON _ Emph=No|Number=Plur|Person=2|PronType=Prs 1 nsubj _ start_char=7|end_char=12|ner=O
3 de de ADP _ _ 4 mark _ start_char=13|end_char=15|ner=O
4 couvrir couvrir VERB _ VerbForm=Inf 1 ccomp _ start_char=16|end_char=23|ner=O
5 les le DET _ Definite=Def|Number=Plur|PronType=Art 6 det _ start_char=24|end_char=27|ner=O
6 risques risque NOUN _ Gender=Masc|Number=Plur 4 obj _ start_char=28|end_char=35|ner=O
7 potentiels potentiel ADJ _ Gender=Masc|Number=Plur 6 amod _ start_char=36|end_char=46|ner=O|SpaceAfter=No
8 , , PUNCT _ _ 12 punct _ start_char=46|end_char=47|ner=O
9 y y PRON _ Emph=No|ExtPos=ADP|Person=3|PronType=Prs 12 case _ start_char=48|end_char=49|ner=O
10 compris comprendre VERB _ Gender=Masc|Tense=Past|VerbForm=Part|Voice=Pass 9 fixed _ start_char=50|end_char=57|ner=O
11 les le DET _ Definite=Def|Number=Plur|PronType=Art 12 det _ start_char=58|end_char=61|ner=O
12 incendies incendie NOUN _ Gender=Masc|Number=Plur 6 nmod _ start_char=62|end_char=71|ner=O|SpaceAfter=No
13 , , PUNCT _ _ 15 punct _ start_char=71|end_char=72|ner=O
14 les le DET _ Definite=Def|Number=Plur|PronType=Art 15 det _ start_char=73|end_char=76|ner=O
15 catastrophes catastrophe NOUN _ Gender=Fem|Number=Plur 12 conj _ start_char=77|end_char=89|ner=O
16 naturelles naturel ADJ _ Gender=Fem|Number=Plur 15 amod _ start_char=90|end_char=100|ner=O
17 et et CCONJ _ _ 19 cc _ start_char=101|end_char=103|ner=O
18 le le DET _ Definite=Def|Gender=Masc|Number=Sing|PronType=Art 19 det _ start_char=104|end_char=106|ner=O
19 vol vol NOUN _ Gender=Masc|Number=Sing 12 conj _ start_char=107|end_char=110|ner=O|SpaceAfter=No
20 . . PUNCT _ _ 1 punct _ start_char=110|end_char=111|ner=O|SpaceAfter=No
Each of the -vous
is an nsubj
instead of expl:pv
. Also, any thoughts on the previous one aside from the nsubj -> expl:pv
change?
I would say that the change nsubj
to expl:pv
is required for all occurrences of -vous
. I can't see any other changes in theses sentences. Thanks
Alright, I put a candidate fake training file here:
https://github.com/stanfordnlp/handparsed-treebank/commit/0fac6a83754baf52f93eff66a5447340d06f1d3d
Any thoughts on these?
Also sent them to a former colleague who's worked on French datasets before.
If you find any other regressions, please don't hesitate to send them our way. I can rerun the depparse training with these sentences and see if it helps.
welll.... just training on those sentences isn't helping either model get the expl:pv
relation in Assurez-vous
. Maybe a couple more sentences would help, maybe not (there is a cutoff of 7 where it starts finetuning words, so it may indeed help to add a couple more). At any rate, I suggest using the default_accurate
package, since you seemed pretty satisfied with the other parses above
Alright, I realized I had mistrained the models with the new dependencies. The new models seem to get expl:pv
for a couple of the examples I tried for assurez-vous
. I posted those as the new defaults. I'll send those sentences to a former colleague to see if she has any suggestions on the dependencies, just to make sure
Describe the bug Take the following sentence: Assurez-vous d'être à l'heure !
The word vous has a wrong dependency relation with Stanza 1.8.2, but correct with Stanza 1.8.1 Stanza 1.8.1 :
Stanza 1.8.2 :
To Reproduce Steps to reproduce the behavior: see above
Expected behavior I would expect the same analysis independent of the version
Environment (please complete the following information):
Additional context Add any other context about the problem here.