thiswillbeyourgithub / AnnA_Anki_neuronal_Appendix

Using machine learning on your anki collection to enhance the scheduling via semantic clustering and semantic similarity
GNU General Public License v3.0
62 stars 1 forks source link

Usability in language learning? #1

Closed ghost closed 3 years ago

ghost commented 3 years ago

Hey! This project is just awesome :1st_place_medal: Wondering if this can be used for language learning or will be in the future? Would be cool if you considered it. I'm currently using the MorphMan addon but it lacks a lot of maintenance and isn't really good for some things. Anyways, congrats for this awesome idea!

thiswillbeyourgithub commented 3 years ago

Hi!

Thanks a lot for the kind words, it's really nice to see my work being appreciated :)

This can absolutely be used for language learning. Can you explain more precisely your use case and how MorphMan works? I can imagine some tweaks that would make it particularly efficient for that. I would totally implement this.

Thanks!

ghost commented 3 years ago

Hey! From the official addon info: MorphMan is an Anki addon that tracks what words you know, and utilizes that information to optimally reorder language cards. This greatly optimizes your learning queue, as you will only see sentences with exactly one unknown word (see i+1 principle for a more detailed explanation).

https://github.com/kaegi/MorphMan

It's a pretty complex pice of software which is now largely unmantained (barely working though). Depending on the language one's learning and the writing system used by that language, there have been several additions to the code by third parties. For example for japanese and chinese there's been a preliminary test for making Spacy work.

thiswillbeyourgithub commented 3 years ago

Interesting. But I don't understand something. MorphMan is supposed to be used on your review queue or to change the order of a fresh deck only once or what?

Right now AnnA is supposed to be used to create a filtered deck, daily if need be. It doesn't change the new queue but only reviews (not even learning).

ghost commented 3 years ago

Oh! Interesting. Well, MorphMan uses tags to sort what you should learn based on what you know. You press K to tell Morphman that you already know the contents of a note, and when you press Ctrl+M after having reviewed your daily notes, it reorganizes the new cards based on what you already marked with K.

Say for example you want to learn russian, you grab a list of 200 words and after having learned these, you press K and tell Morphman what to look for in any other note you add to your deck. For example 2000 sentences. Morphman then organizes these new cards based on what you know and tags them accordingly.

thiswillbeyourgithub commented 3 years ago

Interesting, thanks for the explanation.

Currently AnnA only deals with the review queue and I think it's gonnay stay like that for a while and not care about optimizing the new cards list.

That being said, using sBERT models to encode sentence is probably not the best to handle language reviews. I think tf-idf might be better suited for that and I implemented it in the past.

I added reimplementing that to the TODO list, but I have no idea when I'll be able to do it.

ghost commented 3 years ago

Cool! Well, I'll be looking forward to it some day in the future. I'm no programmer myself, just interested in languages and making it the easiest to learn them. So MorphMan has helped me a lot with this. Thanks for taking the time to explain!

thiswillbeyourgithub commented 3 years ago

My pleasure :) don't hesitate to spread the word.

I'll keep this issue open and hopefully will tell you when I implemented this. It should only take a couple of hours but I have to limit the time I devote to this project :)

thiswillbeyourgithub commented 3 years ago

I reimplemented TF_IDF. I didn't test this thoroughly so there might be some breakings + I changed quite a lot of things recently.

I will close this issue but don't hesitate to come back if you have issues or questions.

After more testing using the compare.py file, it seems that sBERT might not be THAT powerful on my medical cards :/ But TF_IDF will surely work if you supply the right amount of acronyms.