streetsidesoftware / cspell

A Spell Checker for Code!
https://cspell.org
MIT License
1.26k stars 103 forks source link

[Feature request] Add docs for cspell-lib so that it's easier to use it as a library #1813

Open wujekbogdan opened 3 years ago

wujekbogdan commented 3 years ago

I'm working on a node.js spell-checking service.

I was looking for a good and actively maintained JS spell-checking JS library. It turned out that the most popular lib - typo.js despite having lots of daily downloads is not actively developed. It isn't very powerful either. Other libs I found suffer from the same issue.

I found that cspell is used internally by VSCode, so it seemed to be a perfect candidate, until I found it's not intended to be used as a library. The main purpose, from what I see, is a command line tool.

I started diging into the source code and I found that scpell-lib is a pretty decent tool and has everything I need. Thanks to unit tests I was able to figure out how to put all the pieces together and developed a little proof-of-concept:

import {
  checkText,
  combineTextAndLanguageSettings,
  CompoundWordsMethod,
  createSpellingDictionary,
  finalizeSettings,
  getDefaultSettings,
  getDictionary,
} from 'cspell-lib';
import { SpellingDictionaryCollection } from 'cspell-lib/dist/SpellingDictionary';

/**
 * @param customWords
 * @return {Promise<function(*=): Promise<(null|string)[]>>}
 * @constructor
 */
export const SpellcheckerFactory = async (customWords = []) => {
  const settings = {
    // I'm not sure if I need the entire default settings object
    ...getDefaultSettings(),
    // I want to use the lib just for plain text. I'm not sure if this is the best way to disable programming languages spell-checking
    enabledLanguageIds: [],
  };

  // I'm not sure if passing '' as a second argument is correct
  const fileSettings = combineTextAndLanguageSettings(settings, '', ['plaintext']);
  const finalSettings = finalizeSettings(fileSettings);
  const [dictionary, customDictionary] = await Promise.all([
    // Is it OK to get dictionary before I initialize the custom dictionary?
    getDictionary(finalSettings),
    // I'm not sure if `name` and `source` attributes make any difference.
    createSpellingDictionary(
      customWords,
      'customDictionary',
      'customWords',
      undefined
    ),
  ])
  const dictionariesCollection = new SpellingDictionaryCollection(
    [customDictionary, dictionary],
    'dictionaries'
  );

  const getSuggestion = word => {
    const suggestions = dictionariesCollection.suggest(word, 1, CompoundWordsMethod.SEPARATE_WORDS);
    return suggestions.length ? suggestions[0].word : null;
  };

  return async phrase => {
    const checkedText = await checkText(phrase, fileSettings);
    const errors = checkedText.items.filter(({ isError }) => isError);

    return Promise.all(errors.map(({ text }) => getSuggestion(text)));
  };
};
import { checkSpelling } from './spelchecker';

describe('checkSpelling', () => {
  it('should check spelling', async () => {
    expect(await checkSpelling('zażułć gęśla jaśń')).toEqual(['zażółć', 'gęślą', 'jaźń']);
  });
});

The problem is that I have no idea if what I'm doing is right. It works, but most likely it could be done better/cleaner/more efficiently. I have several doubts - see the comments in the code.

It would be great if cspell-lib was documented. This lib seems to be the best spell-checking lib on the market. It would be nice if we could use it with ease.

Jason3S commented 3 years ago

@wujekbogdan,

Cool idea to create a service. There are a LOT of questions in this request.

A few tips:

Configuration / Settings are your friend

If your custom word list is static, then store it in a text file with one word per line. You can reference it in the settings:

Store all your custom dictionary / settings in a cspell.json or cspell.config.js file and use readSettings function to load them.

If you only want to use your own custom word list, then the following will work:

  const settings = {
    // Needed to load existing dictionaries. Not needed if you only plan to use your own.
    ...getDefaultSettings(),
    // Not needed
    // enabledLanguageIds: [],
    // Optionally your custom words can go here.
    words: customWords // these words will be part of the dictionary returned by getDictionary
  };

I suggest using mergeSettings to build up the settings if you read settings from a file.

const settings = mergeSettings(getDefaultSettings(), readSettings('path to your cspell.config.js`));
// empty '' is fine. The method looks for embedded `cspell` settings in the document. Since you do not
// expect them, no need to send any text.
const fileSettings = combineTextAndLanguageSettings(settings, '', ['plaintext']);

Avoid using compound word suggestions.

Avoid using compound word suggestions, they are very slow. Only use them if you expect to be splitting words.

const suggestions = dictionary.suggest(word, 1);
Jason3S commented 3 years ago

It would be great if cspell-lib was documented. This lib seems to be the best spell-checking lib on the market. It would be nice if we could use it with ease.

I agree.

wujekbogdan commented 3 years ago

Thanks a lot for the quick response!

If you only want to use your own custom word list, then the following will work:

Is it faster than the current solution that relies on SpellingDictionaryCollection? Are there any pros/cons of using one technique over another?

Avoid using compound word suggestions, they are very slow. Only use them if you expect to be splitting words.

Thanks for the tip. I'll have it in mind.

Jason3S commented 3 years ago

Is it faster than the current solution that relies on SpellingDictionaryCollection? Are there any pros/cons of using one technique over another?

Adding it via words: will 1. Create a SpellingDictionary with your words. 2. Enable your words to be used by checkText.

Experiment with the command line app to get a feel for things. Everything is configuration driven.

Jason3S commented 3 years ago

You should be using validateText instead of checkText. checkText doesn't do what you want. It generates the command line result for cspell check.

On Fri, Oct 1, 2021 at 9:53 PM wujekbogdan @.***> wrote:

Thanks for the tip. It's definitely more elegant to pass words as settings, but sadly it doesn't seem to work. That's my current code:

import { checkText, combineTextAndLanguageSettings, finalizeSettings, getDefaultSettings, getDictionary,} from 'cspell-lib';import customWordsArray from './customWords.json'; export const SpellcheckerFactory = async (customWords = []) => { const settings = { ...getDefaultSettings(), enabledLanguageIds: [], words: customWords, };

const fileSettings = combineTextAndLanguageSettings(settings, '', ['plaintext']); const finalSettings = finalizeSettings(fileSettings); const dictionary = await getDictionary(finalSettings);

const getSuggestion = word => { const suggestions = dictionary.suggest(word, 1); return suggestions.length ? suggestions[0].word : null; };

return async phrase => { const checkedText = await checkText(phrase, fileSettings); const itemsWithSuggestions = await Promise.all( checkedText.items.map(({ isError, text, startPos, endPos }) => ({ text, isError, startPos, endPos, suggestion: isError ? getSuggestion(text) : null, })) ); const suggestedPhrase = itemsWithSuggestions .map(({ isError, suggestion, text }) => { return isError && suggestion ? suggestion : text; }) .join('');

return {
  items: itemsWithSuggestions,
  phrase,
  suggestion: suggestedPhrase,
};

};}; /* @param phrase @return {Promise<(string | null)[]>} /export const checkSpelling = async phrase => { const spellChecker = await SpellcheckerFactory(customWordsArray);

return spellChecker(phrase);};

customWords are ignored when I call checkText. I'm not getting suggestions for these custom words. The previous technique (with SpellingDictionaryCollection) worked.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/streetsidesoftware/cspell/issues/1813#issuecomment-932509206, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4RD2PWCXZU4OBZHUXSXT3UEYGTXANCNFSM5FAT2J4Q .

Jason3S commented 3 years ago

Use words works just fine.

import { validateText, combineTextAndLanguageSettings, finalizeSettings, getDefaultSettings } from 'cspell-lib';

const customWords = ['wordz', 'cuztom', 'clockz'];

export const SpellcheckerFactory = async (customWords: string[] = []) => {
    const settings = {
        ...getDefaultSettings(),
        enabledLanguageIds: [],
        words: customWords,
    };

    const fileSettings = combineTextAndLanguageSettings(settings, '', ['plaintext']);
    const finalSettings = finalizeSettings(fileSettings);

    return async (phrase: string) => {
        return await validateText(phrase, finalSettings, { generateSuggestions: true });
    };
};

export const checkSpelling = async (phrase: string) => {
    const spellChecker = await SpellcheckerFactory(customWords);

    return spellChecker(phrase);
};

async function run() {
    const r = await checkSpelling('These are my coztom wordz.');
    console.log('%o', r);
}

run();

Result:

[
  {
    text: 'coztom',
    offset: 13,
    line: { text: 'These are my coztom wordz.', offset: 0 },
    isFlagged: false,
    isFound: false,
    suggestions: [
      'cuztom',     'contos',
      'cotton',     'conto',
      'Cotton',     'condom',
      'custom',     'coom',
      'bottom',     'coyote',
      [length]: 10
    ]
  },
  [length]: 1
]
wujekbogdan commented 3 years ago

Use words works just fine.

Yes, I can confirm that. Sorry for the confusion. After ​posting my last comment I realized that It indeed works, but not as I would expect. Then I deleted my comment, but you were faster and already responded to it :)

The reason why I thought it didn't work was the fact I was using words with diacritic marks (e.g. zażółć gęślą jaźń) - in this case cspell doesn't work well (or at least - does not work like I would expect it to work).

This is out of scope of this ticket - i'll create a new one where I explain the problem in more detail.

You should be using validateText instead of checkText. checkText

Thank you. I'll experiment with it a bit more.

Jason3S commented 3 years ago

The reason why I thought it didn't work was the fact I was using words with diacritic marks (e.g. zażółć gęślą jaźń) - in this case cspell doesn't work well (or at least - does not work like I would expect it to work).

This is out of scope of this ticket - i'll create a new one where I explain the problem in more detail.

Are you trying to ignore diacritic marks or flag them? By default the spell checker is case / accent insensitive. Try:

    const settings = {
        ...getDefaultSettings(),
        caseSensitive: true,
        words: customWords,
    };
reilnuud commented 7 months ago

@Jason3S Sorry to dig up this thread, but there's a real need on our side for this we we are similarly hoping to integrate this into a tool on our team. This is by far the most comprehensive tool I've found.

I have gotten the above code snippets to work, but cannot seem to get any of the default dictionaries to load -- only whatever custom words I supply. I may be missing some context from this original thread -- maybe that was never the intent of these snippets, but could you provide some insight on what might be missing from this snippet to get one of the bundled dictionaries loaded?

Use words works just fine.

import { validateText, combineTextAndLanguageSettings, finalizeSettings, getDefaultSettings } from 'cspell-lib';

const customWords = ['wordz', 'cuztom', 'clockz'];

export const SpellcheckerFactory = async (customWords: string[] = []) => {
    const settings = {
        ...getDefaultSettings(),
        enabledLanguageIds: [],
        words: customWords,
    };

    const fileSettings = combineTextAndLanguageSettings(settings, '', ['plaintext']);
    const finalSettings = finalizeSettings(fileSettings);

    return async (phrase: string) => {
        return await validateText(phrase, finalSettings, { generateSuggestions: true });
    };
};

export const checkSpelling = async (phrase: string) => {
    const spellChecker = await SpellcheckerFactory(customWords);

    return spellChecker(phrase);
};

async function run() {
    const r = await checkSpelling('These are my coztom wordz.');
    console.log('%o', r);
}

run();

Result:

[
  {
    text: 'coztom',
    offset: 13,
    line: { text: 'These are my coztom wordz.', offset: 0 },
    isFlagged: false,
    isFound: false,
    suggestions: [
      'cuztom',     'contos',
      'cotton',     'conto',
      'Cotton',     'condom',
      'custom',     'coom',
      'bottom',     'coyote',
      [length]: 10
    ]
  },
  [length]: 1
]
Jason3S commented 7 months ago

@reilnuud,

Two things:

  1. The api changed slightly since the example was written. getDefaultSettings now returns a Promise.
        const settings = {
    -       ...getDefaultSettings(),
    +       ...(await getDefaultSettings()),
            enabledLanguageIds: [],
            words: customWords,
        };
  2. There is another endpoint that might be easier to use spellCheckDocument.

This is a copy of the test file test-packages/cspell-lib/test-cspell-esbuild-cjs/source/src/index.ts. It is use to make sure bundling the library works, and also servers as a example on how to spell check a file via code.

import assert from 'assert';
import { spellCheckDocument } from 'cspell-lib';
import { resolve } from 'path';
import { pathToFileURL } from 'url';

// cspell:ignore wordz coztom clockz cuztom
const customWords = ['wordz', 'cuztom', 'clockz'];

async function checkSpelling(phrase: string) {
    const result = await spellCheckDocument(
        { uri: 'text.txt', text: phrase, languageId: 'plaintext', locale: 'en' },
        { generateSuggestions: true, noConfigSearch: true },
        { words: customWords, suggestionsTimeout: 2000 },
    );
    return result.issues;
}

async function checkFile(filename: string) {
    const uri = pathToFileURL(resolve(filename)).toString();
    const result = await spellCheckDocument(
        { uri },
        { generateSuggestions: true, noConfigSearch: true },
        { words: customWords, suggestionsTimeout: 2000 },
    );
    return result.issues;
}

export async function run() {
    console.log(`Start: ${new Date().toISOString()}`);
    const r = await checkSpelling('These are my coztom wordz.');
    console.log(`End: ${new Date().toISOString()}`);
    // console.log(r);
    assert(r.length === 1, 'Make sure we got 1 spelling issue back.');
    assert(r[0].text === 'coztom');
    assert(r[0].suggestions?.includes('cuztom'));
    // console.log('%o', r);

    const argv = process.argv;
    if (argv[2]) {
        console.log('Spell check file: %s', argv[2]);
        const issues = await checkFile(argv[2]);
        assert(!issues.length, 'no issues');
    }
}