sajadhsm / new-word-tab

A browser extension to learn a new word per new tab
32 stars 0 forks source link

[Suggestion] Indicate which list a word is from #20

Open silverwings15 opened 2 years ago

silverwings15 commented 2 years ago

Per the title, I think this could be useful in many situations. For example in my case, I already disabled the two Oxford lists but am still getting some pretty basic words, so I would like to see which list such words are from. And will even be more useful in the future if you add the ability to use custom word lists, which will be a gamechanger for sure 😂

silverwings15 commented 2 years ago

Also if you like, I could provide some word lists that you may consider adding in the future

sajadhsm commented 2 years ago

Yeah, showing the list of the word could be useful.

Some words are in more than one list. To make the random word selection easier, currently, all the words are put in a set to cut the duplication. Also for performance purposes, no word list relation is kept at the runtime.

To have this feature, the current logic needs to change.

One way is to first pick a random word list and then select a word from that list. This approach is great in terms of performance. But it doesn't tell us if the word exists in another list or not.

Another way is to run over all the words and compute a property for each word containing the lists that the word exists on. I think this computation will freeze the app for a sensible amount of time and should not happen at runtime.

We can avoid the runtime computation by storing the relation in the source code and loading it at run time. In general, this method consumes a lot more memory than the first approach but gives us all the lists. (But if we want to have the custom words list, they won't be included.)

Need to do some experiments and think more about it to find a good solution.

Using a custom words list is already in the roadmap but unfortunately, I didn't find much time to read about other storage solutions to implement it yet. I'll consider it in the future.


Yes, having more words list will definitely help and I would be really appreciated it if you share them.

silverwings15 commented 2 years ago

One way is to first pick a random word list and then select a word from that list. This approach is great in terms of performance. But it doesn't tell us if the word exists in another list or not.

I think this is the way to go, with the caveat that you just have to maintain the default lists and ensure they don't contain duplicate words (i can definitely help with this).

If later on, users can add some custom word lists which contain duplicates, then they will have to recognize and delete the words themselves. Or you also can maybe provide them with an option, such as when adding a new list, a pop-up will appear and ask: "Do you want to scan (all lists) for duplicates"? and return the result of duplicated words, where the user can decide from which list each of the redundant words will be removed.

As for the word lists, I have some decent ideas in mind. Will try to generate a few lists from well known sources and targeted at different groups for broad coverage (educational, fun, easy, hard etc). If possible, may I have the current list of words you're using? I can use that as a base to check for duplication?


I'm also throwing out some quick ideas that popped into my head just now so i don't forget them, can elaborate later if you like:

sajadhsm commented 2 years ago

I don't think we should modify the word lists since they are somehow standard. If a word is in several lists, how should we decide which list must keep it and which ones to remove it?

I prefer to try finding a minimal solution so that users can easily understand the flow. I think duplication is fine but duplicated words should not have a higher chance of being randomly picked.

Awesome! You can find all the word lists here: https://github.com/sajadhsm/new-word-tab/tree/main/src/data/words

Until now I haven't thought much about the words management but as you mentioned there is a high potential to make it much more useful.


Thanks for the ideas. I've just set up Github discussions for the project. We can share our ideas there so they don't get lost in different issues 😄 💡

silverwings15 commented 2 years ago

I don't think we should modify the word lists since they are somehow standard. If a word is in several lists, how should we decide which list must keep it and which ones to remove it?

you raise a fair point. my current logic is that, with tests such as e.g. IELTS or SAT, there are no 'official' vocab lists per se, while if you're compiling words from a series of textbooks, then those would be more official and thus more warranting of the entries in common. i will flesh out my rationale on this and work to create some word lists so you can check them out

sajadhsm commented 2 years ago

Yeah, you're probably right. I don't have much information on such tests and their vocab lists. I've found these lists around the web and I didn't carefully check out how they compiled them. Need to do more research on that too.

Thanks for the time you are putting into improving the whole project. Really appreciated 🤝