add the document text English wordlists

yzhang-gh commented 10 months ago

Many thanks! I will try to sort it out this weekend.

yzhang-gh commented 10 months ago

Thanks again! Just for your information, I have made a few changes in 88766f50, including

added a configuration collectWordsFromCurrentFile (defaults to false)
used Set instead of checking duplicated words one by one
changed ^.*[a-zA-Z]{1,}.*$ to ^[\w\-]+$ (I don't quite understand the former, maybe you have some other considerations?)
used a global variable currDocWords instead of passing in the word list repeatly (just my personal preference)

Also, there is a manually set maxLineCount = 1000. I haven't tested a larger value, but I think it might be enough and it doesn't seem to cause any performance issue for me.

ArithmeticError commented 10 months ago

Thanks again! Just for your information, I have made a few changes in 88766f5, including↳

added a configuration collectWordsFromCurrentFile (defaults to false)

used Set instead of checking duplicated words one by one

changed ^.*[a-zA-Z]{1,}.*$ to ^[\w\-]+$ (I don't quite understand the former, maybe you have some other considerations?)

used a global variable currDocWords instead of passing in the word list repeatly (just my personal preference)

Also, there is a manually set maxLineCount = 1000. I haven't tested a larger value, but I think it might be enough and it doesn't seem to cause any performance issue for me.↳

Thank you very much for the improvements you made to this code! I use ^.*[a-zA-Z]{1,}.*$ to filter out all matches containing only numbers or only underscores. But this situation is usually rare, so I think the changed regular expression /^[\w\-]+$ after merging is consistent with the regular expression I used before in most cases.

yzhang-gh commented 10 months ago

I use ^.*[a-zA-Z]{1,}.*$ to filter out all matches containing only numbers or only underscores.

That makes sense to me. And I agree it is rare so let's just leave it as is now.

yzhang-gh / vscode-dic-completion

add the document text English wordlists #44