scribe-org / Scribe-Data

Wikidata, Wiktionary and Wikipedia language data extraction
GNU General Public License v3.0
23 stars 25 forks source link

Fix of Portuguese Verb process #68

Closed andrewtavis closed 3 months ago

andrewtavis commented 7 months ago

Terms

Description

Hey there @wkyoshida 👋 I'm realizing that the Portuguese verbs process is a little bit screwed up in that we're not getting conjugations for an entire conjugation. If you check out verbs_queries.json in the Portuguese verbs directory, we're not getting anything for what's supposed to be a perfect tense 🤔 This seems to be a mix of me not understanding Portuguese and also some weird modeling on Wikidata. Could you give ser and maybe a few other verbs a look and let me know what forms we should include. Generally we're trying to do present, any preterite/imperfect past forms, and for Portuguese it looks like future simple had good coverage, so that was included as well. For right now let's just do four total, so just the one that we're not getting anything from should be edited such that we get one more verb tense of value out of the process :)

Contribution

Let's discuss this in the sync tomorrow! We'll need an issue for iOS as well once this is done to change what the verb form is in the app as well 😊

shashank-iitbhu commented 6 months ago

Hey @andrewtavis ! We're getting Perfect Tense conjugations but only for 36 verbs. Checked this by searching perfFPS in the verbs_queried.json file. Here you can see that we are getting all six conjugations of perfect tense.

{
"fSimpTPP": "cantaram",
"fSimpSPP": "cantáreis",
"fSimpFPP": "cantáramos",
"fSimpTPS": "cantara",
"fSimpSPS": "cantaras",
"fSimpFPS": "cantara",
"impTPP": "cantavam",
"impSPP": "cantáveis",
"impFPP": "cantávamos",
"impTPS": "cantava",
"impSPS": "cantavas",
"impFPS": "cantava",
"presTPP": "cantam",
"presSPP": "cantais",
"presFPP": "cantamos",
"presTPS": "canta",
"presSPS": "cantas",
"presFPS": "canto",
"infinitive": "cantarmos",
"perfTPP": "cantaram",
"perfSPP": "cantastes",
"perfFPP": "cantamos",
"perfTPS": "cantou",
"perfSPS": "cantaste",
"perfFPS": "cantei"
},
shashank-iitbhu commented 6 months ago

Also noticed this towards the end of verbs_queried.json file: For some verbs we are not getting any tense conjugations.

{
"infinitive": "ir"
},
{
"infinitive": "ter"
},
{
"infinitive": "segurar"
},
{
"infinitive": "vir"
},
{
"infinitive": "abolir"
},
{
"infinitive": "portar"
},
andrewtavis commented 6 months ago

I guess we're potentially good on this for now, but it would be good to check on Wikidata as well to see if maybe perfect tenses for Portuguese are normally being labeled in a different way than we're querying them 🤔

andrewtavis commented 6 months ago

Thanks for looking into this, @shashank-iitbhu!

andrewtavis commented 5 months ago

CC @wkyoshida

andrewtavis commented 3 months ago

54a5e36 seems to close this (for now). For now is because the past perfect PID that's being applied in Portuguese verbs isn't the standard one being applied in other languages, so it's likely that we'll need to change this at some point. I'll make an issue in Scribe-Server where we can discuss the process of checking the total coverage of prior files vs. the new ones such that we can do an alert if there's a major drop. There is a very likely chance that this PID will be changed. Quick fix if it is, but also best to get an explicit warning 😊