Open wujekbogdan opened 3 years ago
@wujekbogdan,
Cool idea to create a service. There are a LOT of questions in this request.
A few tips:
If your custom word list is static, then store it in a text file with one word per line. You can reference it in the settings:
Store all your custom dictionary / settings in a cspell.json
or cspell.config.js
file and use readSettings
function to load them.
If you only want to use your own custom word list, then the following will work:
const settings = {
// Needed to load existing dictionaries. Not needed if you only plan to use your own.
...getDefaultSettings(),
// Not needed
// enabledLanguageIds: [],
// Optionally your custom words can go here.
words: customWords // these words will be part of the dictionary returned by getDictionary
};
I suggest using mergeSettings
to build up the settings if you read settings from a file.
const settings = mergeSettings(getDefaultSettings(), readSettings('path to your cspell.config.js`));
// empty '' is fine. The method looks for embedded `cspell` settings in the document. Since you do not
// expect them, no need to send any text.
const fileSettings = combineTextAndLanguageSettings(settings, '', ['plaintext']);
Avoid using compound word suggestions, they are very slow. Only use them if you expect to be splitting words.
const suggestions = dictionary.suggest(word, 1);
It would be great if cspell-lib was documented. This lib seems to be the best spell-checking lib on the market. It would be nice if we could use it with ease.
I agree.
Thanks a lot for the quick response!
If you only want to use your own custom word list, then the following will work:
Is it faster than the current solution that relies on SpellingDictionaryCollection
? Are there any pros/cons of using one technique over another?
Avoid using compound word suggestions, they are very slow. Only use them if you expect to be splitting words.
Thanks for the tip. I'll have it in mind.
Is it faster than the current solution that relies on
SpellingDictionaryCollection
? Are there any pros/cons of using one technique over another?
Adding it via words:
will 1. Create a SpellingDictionary with your words. 2. Enable your words to be used by checkText
.
Experiment with the command line app to get a feel for things. Everything is configuration driven.
You should be using validateText
instead of checkText
. checkText
doesn't do what you want. It generates the command line result for cspell check
.
On Fri, Oct 1, 2021 at 9:53 PM wujekbogdan @.***> wrote:
Thanks for the tip. It's definitely more elegant to pass words as settings, but sadly it doesn't seem to work. That's my current code:
import { checkText, combineTextAndLanguageSettings, finalizeSettings, getDefaultSettings, getDictionary,} from 'cspell-lib';import customWordsArray from './customWords.json'; export const SpellcheckerFactory = async (customWords = []) => { const settings = { ...getDefaultSettings(), enabledLanguageIds: [], words: customWords, };
const fileSettings = combineTextAndLanguageSettings(settings, '', ['plaintext']); const finalSettings = finalizeSettings(fileSettings); const dictionary = await getDictionary(finalSettings);
const getSuggestion = word => { const suggestions = dictionary.suggest(word, 1); return suggestions.length ? suggestions[0].word : null; };
return async phrase => { const checkedText = await checkText(phrase, fileSettings); const itemsWithSuggestions = await Promise.all( checkedText.items.map(({ isError, text, startPos, endPos }) => ({ text, isError, startPos, endPos, suggestion: isError ? getSuggestion(text) : null, })) ); const suggestedPhrase = itemsWithSuggestions .map(({ isError, suggestion, text }) => { return isError && suggestion ? suggestion : text; }) .join('');
return { items: itemsWithSuggestions, phrase, suggestion: suggestedPhrase, };
};}; /* @param phrase @return {Promise<(string | null)[]>} /export const checkSpelling = async phrase => { const spellChecker = await SpellcheckerFactory(customWordsArray);
return spellChecker(phrase);};
customWords are ignored when I call checkText. I'm not getting suggestions for these custom words. The previous technique (with SpellingDictionaryCollection) worked.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/streetsidesoftware/cspell/issues/1813#issuecomment-932509206, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4RD2PWCXZU4OBZHUXSXT3UEYGTXANCNFSM5FAT2J4Q .
Use words
works just fine.
import { validateText, combineTextAndLanguageSettings, finalizeSettings, getDefaultSettings } from 'cspell-lib';
const customWords = ['wordz', 'cuztom', 'clockz'];
export const SpellcheckerFactory = async (customWords: string[] = []) => {
const settings = {
...getDefaultSettings(),
enabledLanguageIds: [],
words: customWords,
};
const fileSettings = combineTextAndLanguageSettings(settings, '', ['plaintext']);
const finalSettings = finalizeSettings(fileSettings);
return async (phrase: string) => {
return await validateText(phrase, finalSettings, { generateSuggestions: true });
};
};
export const checkSpelling = async (phrase: string) => {
const spellChecker = await SpellcheckerFactory(customWords);
return spellChecker(phrase);
};
async function run() {
const r = await checkSpelling('These are my coztom wordz.');
console.log('%o', r);
}
run();
Result:
[
{
text: 'coztom',
offset: 13,
line: { text: 'These are my coztom wordz.', offset: 0 },
isFlagged: false,
isFound: false,
suggestions: [
'cuztom', 'contos',
'cotton', 'conto',
'Cotton', 'condom',
'custom', 'coom',
'bottom', 'coyote',
[length]: 10
]
},
[length]: 1
]
Use words works just fine.
Yes, I can confirm that. Sorry for the confusion. After posting my last comment I realized that It indeed works, but not as I would expect. Then I deleted my comment, but you were faster and already responded to it :)
The reason why I thought it didn't work was the fact I was using words with diacritic marks (e.g. zażółć gęślą jaźń
) - in this case cspell
doesn't work well (or at least - does not work like I would expect it to work).
This is out of scope of this ticket - i'll create a new one where I explain the problem in more detail.
You should be using
validateText
instead ofcheckText
.checkText
Thank you. I'll experiment with it a bit more.
The reason why I thought it didn't work was the fact I was using words with diacritic marks (e.g.
zażółć gęślą jaźń
) - in this casecspell
doesn't work well (or at least - does not work like I would expect it to work).This is out of scope of this ticket - i'll create a new one where I explain the problem in more detail.
Are you trying to ignore diacritic
marks or flag them?
By default the spell checker is case / accent insensitive. Try:
const settings = {
...getDefaultSettings(),
caseSensitive: true,
words: customWords,
};
@Jason3S Sorry to dig up this thread, but there's a real need on our side for this we we are similarly hoping to integrate this into a tool on our team. This is by far the most comprehensive tool I've found.
I have gotten the above code snippets to work, but cannot seem to get any of the default dictionaries to load -- only whatever custom words I supply. I may be missing some context from this original thread -- maybe that was never the intent of these snippets, but could you provide some insight on what might be missing from this snippet to get one of the bundled dictionaries loaded?
Use
words
works just fine.import { validateText, combineTextAndLanguageSettings, finalizeSettings, getDefaultSettings } from 'cspell-lib'; const customWords = ['wordz', 'cuztom', 'clockz']; export const SpellcheckerFactory = async (customWords: string[] = []) => { const settings = { ...getDefaultSettings(), enabledLanguageIds: [], words: customWords, }; const fileSettings = combineTextAndLanguageSettings(settings, '', ['plaintext']); const finalSettings = finalizeSettings(fileSettings); return async (phrase: string) => { return await validateText(phrase, finalSettings, { generateSuggestions: true }); }; }; export const checkSpelling = async (phrase: string) => { const spellChecker = await SpellcheckerFactory(customWords); return spellChecker(phrase); }; async function run() { const r = await checkSpelling('These are my coztom wordz.'); console.log('%o', r); } run();
Result:
[ { text: 'coztom', offset: 13, line: { text: 'These are my coztom wordz.', offset: 0 }, isFlagged: false, isFound: false, suggestions: [ 'cuztom', 'contos', 'cotton', 'conto', 'Cotton', 'condom', 'custom', 'coom', 'bottom', 'coyote', [length]: 10 ] }, [length]: 1 ]
@reilnuud,
Two things:
getDefaultSettings
now returns a Promise.
const settings = {
- ...getDefaultSettings(),
+ ...(await getDefaultSettings()),
enabledLanguageIds: [],
words: customWords,
};
spellCheckDocument
.This is a copy of the test file test-packages/cspell-lib/test-cspell-esbuild-cjs/source/src/index.ts. It is use to make sure bundling the library works, and also servers as a example on how to spell check a file via code.
import assert from 'assert';
import { spellCheckDocument } from 'cspell-lib';
import { resolve } from 'path';
import { pathToFileURL } from 'url';
// cspell:ignore wordz coztom clockz cuztom
const customWords = ['wordz', 'cuztom', 'clockz'];
async function checkSpelling(phrase: string) {
const result = await spellCheckDocument(
{ uri: 'text.txt', text: phrase, languageId: 'plaintext', locale: 'en' },
{ generateSuggestions: true, noConfigSearch: true },
{ words: customWords, suggestionsTimeout: 2000 },
);
return result.issues;
}
async function checkFile(filename: string) {
const uri = pathToFileURL(resolve(filename)).toString();
const result = await spellCheckDocument(
{ uri },
{ generateSuggestions: true, noConfigSearch: true },
{ words: customWords, suggestionsTimeout: 2000 },
);
return result.issues;
}
export async function run() {
console.log(`Start: ${new Date().toISOString()}`);
const r = await checkSpelling('These are my coztom wordz.');
console.log(`End: ${new Date().toISOString()}`);
// console.log(r);
assert(r.length === 1, 'Make sure we got 1 spelling issue back.');
assert(r[0].text === 'coztom');
assert(r[0].suggestions?.includes('cuztom'));
// console.log('%o', r);
const argv = process.argv;
if (argv[2]) {
console.log('Spell check file: %s', argv[2]);
const issues = await checkFile(argv[2]);
assert(!issues.length, 'no issues');
}
}
I'm working on a node.js spell-checking service.
I was looking for a good and actively maintained JS spell-checking JS library. It turned out that the most popular lib - typo.js despite having lots of daily downloads is not actively developed. It isn't very powerful either. Other libs I found suffer from the same issue.
I found that cspell is used internally by VSCode, so it seemed to be a perfect candidate, until I found it's not intended to be used as a library. The main purpose, from what I see, is a command line tool.
I started diging into the source code and I found that
scpell-lib
is a pretty decent tool and has everything I need. Thanks to unit tests I was able to figure out how to put all the pieces together and developed a little proof-of-concept:The problem is that I have no idea if what I'm doing is right. It works, but most likely it could be done better/cleaner/more efficiently. I have several doubts - see the comments in the code.
It would be great if
cspell-lib
was documented. This lib seems to be the best spell-checking lib on the market. It would be nice if we could use it with ease.