scribe-org / Scribe-iOS

iOS app with keyboards for language learners
https://apps.apple.com/app/scribe-language-keyboards/id1596613886
GNU General Public License v3.0
114 stars 70 forks source link

[Deleted] Discuss language exercises based on Scribe usage #85

Closed andrewtavis closed 2 years ago

andrewtavis commented 2 years ago

A long term goal for Scribe is the inclusion of language exercises to practice words that a user has used Scribe for help with. This would need to be done in a way that assures that the user’s typing data is not collected, but could be done via the commandBar. Information that is displayed to a user in the command bar could be saved in the app and then used in various memory style games or spaced repetition flashcards. This feature should be set up in a way that the user can set a time limit so that it doesn't become a distraction. The user should further be given the option to have words added automatically, with the default being that they are not.

Examples:

Related words could also be tested, with the potential for natural language processing over the collected words to check for topics that a user is interested in.

This issue is to discuss the possibility of this feature being added to Scribe, which will in turn lead to issues being created for the Language Practice project.

andrewtavis commented 2 years ago

Something to consider is that these exercises could be used for potentially beneficial purposes in a similar way that Duolingo translations are used to translate actual documents. If Scribe adopted similar translation game structures, then these could be used to update Wikipedia or Wikidata's translations. Other methods could be devised where a user would check other features of open data. Users that are proven to be at a certain level would have their responses aggregated, with this then being used as a means of quality assurance where editors would be able to see where a potentially problematic data point is given user responses.

andrewtavis commented 2 years ago

Spoken exercises are another thing that could be leveraged in open-source to improve open spoken language processing. Datasets for this are currently not very robust, so Scribe could dramatically broaden these by allowing people to read things that are interesting to them out loud (based on user feedback and NLP over their writing), have their pronunciations graded using automatic speech recognition, and then saving these in an open format. Anonymity would be key, with the shear size of the data also being something to consider. At scale this could be very very beneficial and allow diversification in speech recognition via such a large open corpus.

Specifically the user could also be prompted to read things in their native language from time to time, with this then serving as a basis for the corrections of second language speakers. With this in mind for speaking, a user could similarly be prompted with properties for their native language as a way to add or verify data (being shown a word and asked what its type/gender/conjugation is).

thadguidry commented 2 years ago

Common Voice integration? https://commonvoice.mozilla.org/en/datasets

andrewtavis commented 2 years ago

I was actually just about to write you about the above via email :)

Thanks for the link. Do you think there's any value to the above ideas as far as integrating some of this into a pipeline where properties would be able to be made and/or validated based on advanced learners as well as native speakers being asked to do some checks as a "cost"? It'd obviously be a ton of work to get this set up, and I'm still very new to some of the Wikidata processes so maybe this isn't a good direction, but this seems like this could help.

thadguidry commented 2 years ago

To me, some of that functionality might be better suited outside of Scribe as a keyboard enhancement? I could envision some human language curation working as a dedicated app, either mobile or desktop or both. Many in Wikidata already have plans for some of those kinds of apps down the road after Abstract Wikipedia and Wikifunctions come more into focus later this year and next. Your above comments are not completely setting in my brain, sorry! , and so a meetup might be better to help explain your thoughts to me or others.

andrewtavis commented 2 years ago

Was kind of stream of consciousnessing for a moment, so sorry from me too as some of it maybe isn't saying what I want it to. Would be happy to discuss it in more detail in a different forum. Definitely seems to be a bit out of scope at this point especially 😄

Generally the gist of what I'm getting at is: say that we know a user's native language as well as what they're learning. In that case they'd be shown exercises for what they're learning, and as they do those they'd also at times be presented with things for their native language where their responses would be actually checking or adding Wikidata information (verb added -> no conjugations yet -> check with someone to add them). Responses from strong learners could also function in a similar way, but they'd not be as certain.

Will read about Abstract Wikipedia and Wikifunctions now :)

thadguidry commented 2 years ago

OK, now I understand more of what you mean. Basically using some human curation from a language learner to add back knowledge into Wikidata. Yeah, that's the sort of thing that Wikifunctions could help with a bit later on...with some apps that can use the Lexeme knowledge in Wikidata that is ever expanding from native knowledge folks.

andrewtavis commented 2 years ago

Exactly :)

I'm not sure about exact planned structures of Abstract Wikipedia as I've just been looking for the last bit, but it could potentially also integrate with that as a health check of some of the functions. So an edit is made on Wikipedia, and then that edit could then serve as prompt where strong learners/native speakers in each language it's going to would be asked to translate it to check what's pushed to other Wikipedia versions.

thadguidry commented 2 years ago

Abstract Wikipedia is the idea of "generating" wiki pages for languages that do not have one yet. For instance, generating a Wikipedia page in Igbo language for Mariah Carey using the English Wikipedia page for the artist. The generated pages of Abstract Wikipedia will be generated by Wikifunctions that are implemented, curated, and improved by human developers.

Where human curation comes into play will mainly happen with improving the Wikifunctions. Input, let's say in English is fed through a Wikifunction to derive a close approximation or exactly the same meaning in Igbo. How close we get to the desired Igbo will matter on the quality of the Wikifunctions, Renderers, and Constructors...all of which could have a small human curation aspect, indeed. The idea of Abstract Wikipedia is machine-generated pages. Generated through Wikifunctions, and Constructors mainly, and Rendered. For instance, one plan we have is to have a Simple English Renderer...but we could have a Klingon one just as easily from the same Constructors. It's early days, but various devs are making some prototypes and the Wikimedia Foundation with Denny Vrandečić leading the effort. You can keep up with development by reading the updates from this year 2022 by scrolling down this: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates

andrewtavis commented 2 years ago

Ya I guess given the above and what I've further read the ideas of it working as a feedback to Wikidata don't make sense. Specifically if you assume the amount of data that'd be present by the time this feature is released there would be sufficient lexeme forms such that new ones would likely be difficult for even native speakers to understand (or just kind of random and a person wouldn't see value in it). Also the moderation that is currently present makes the health checks redundant in that with more contributors there will be more eyes to make sure everything's alright.

With Wikipedia I guess I didn't realize when reading about Abstract that it was fully function based such that there would be no modeling taking place. The above would make more sense as a validation step of a machine learning based translations. Makes sense, and thanks for the further explanation :)

andrewtavis commented 2 years ago

So staying at the original idea where the long term goal of the language exercises would be:

Will mark this as blocked just so we know that this is as of now not a priority, but again feedback is more than welcome 😊

andrewtavis commented 2 years ago

@thadguidry, would it make sense in your opinion to put the above in the discussion and close this, and then from there make new issues when work on this is going to be done?

thadguidry commented 2 years ago

Can you convert this whole issue into a discussion? That might be better. And it just stays around that way. Category of Idea might be fitting.