psliwka / vim-dirtytalk

spellcheck dictionary for programmers 📖
MIT License
136 stars 5 forks source link

Is a custom word list okay to add to the repository? #43

Open luc-x41 opened 5 months ago

luc-x41 commented 5 months ago

Hi! I've been checking and adding words to an internal dictionary which we've used in our penetration testing reports, blog posts, etc. the last years, when a colleague pointed out that I should maybe just be using a better dictionary than the default one and pointed me here :)

Many of these words, such as cryptographically, canonicalized, satisfiable, and transpiling, are not yet in this repository, so I want to contribute/consolidate those. I have no automatically updating source for them (and we certainly could not publish customers' reports for the project to scrape words from :sweat_smile:), so my question is whether you are interested in including a list of custom words that does not get automatic updates. The words are split out into:

If yes, follow-up questions are:

psliwka commented 5 months ago

Hi, and thanks for your interest in expanding this project! New wordlists are certainly a welcome addition, and having them auto-generated is not a must – so far they're all rendered by scripts because I was simply too lazy to ever craft one manually :sweat_smile:

I've addressed some of your follow-up questions in fed05e1ff1464e79d351acc9518af1a1771e07d2 – TLDR: static lists in ./wordlists/ are okay, try to add multiple smaller lists rather than one big. Also, it's fine to extend existing scripts for e.g. brands.words and acronyms.words to pull words from somewhere else (IDK, maybe a script-embedded list? a separate static "include" file?) to enrich their scrapped output with extra words ;)

Let me know if you have any more questions :)

mamekoro commented 4 months ago

To which word list should I add words that are used in various fields and difficult to classify? (e.g. "resize" and "despawn")

In my opinion, the existing word lists are already chaotic. The word "iterator" is in python.words even though it is not Python-specific. "Btrfs" is in docker.words even though it is independent of Docker.

So, how about creating a new word list like misc.words? The idea is that words that are easy to classify will be added to existing or newly-created word lists, while other words will be added to misc.words.