Getting the "word" under the cursor is really, really complicated.

ianstormtaylor / slate

A completely customizable framework for building rich text editors. (Currently in beta.)

http://slatejs.org

MIT License

29.74k stars 3.24k forks source link

Getting the "word" under the cursor is really, really complicated. #4162

Open aliak00 opened 3 years ago

aliak00 commented 3 years ago

Problem The problem is that I want to be able to get the word under the cursor (collapsed) and the range of that word within a block element. The problem is that slate's Editor.blah functions don't seem sufficient to do it without some crazy logic.

For my use-case a "word" includes the dash and dot (-,.) characters.

I'll use '|' as cursor location. If you have 'hello| world' and call Editor.after with the word unit, you'll get the point after world. If you have 'hello world|' and you call Editor.after with the word unit, you'll get the first point in the next block. The same applies to Editor.after

So to actually get the word under the cursor, this is the logic I have:

// Get start and end, modify it as we move along.
let [start, end] = Range.edges(selection);

// Move forward along until I hit a different tree depth
while (true) {
  const after = Editor.after(editor, end, {
    unit: 'word',
  });
  const wordAfter =
    after && Editor.string(editor, { anchor: end, focus: after });
  if (after && wordAfter && wordAfter.length && wordAfter[0] !== ' ') {
    end = after;
    if (end.offset === 0) { // Means we've wrapped to beginning of another block
      break;
    }
  } else {
    break;
  }
}

// Move backwards
while (true) {
  const before = Editor.before(editor, start, {
    unit: 'word',
  });
  const wordBefore =
    before && Editor.string(editor, { anchor: before, focus: start });
  if (
    before &&
    wordBefore &&
    wordBefore.length &&
    wordBefore[wordBefore.length - 1] !== ' '
  ) {
    start = before;
    if (start.offset === 0) { // Means we've wrapped to beginning of another block
      break;
    }
  } else {
    break;
  }
}

And then I have my word and range:

const wordRange = { anchor: start, focus: end };
const word = Editor.string(editor, wordRange);

Solution A solution would be to not include "space" as part of word boundaries. Or someway for me to tell the Editor.before/after APIs to use the word unit but include specific characters and use other characters as terminations: e.g.

Editor.after(editor, selection.anchor, { unit: 'word', include: '-._', terminateOn: ' ' });

Or to allow { edge: 'end' } in the options so that it doesn't pass the end of the block?

Context Here's a screen shot of a slack thread that has more details:

williamstein commented 3 years ago

For integration of Slate into my product, I also had to write a ridiculously complicated function to get the current word, and I expect many other people have done so as well. I'll sure mine too, in case anybody finds it helpful when they tackle this issue. This isEqual below is from lodash.

// Expand collapsed selection to range containing exactly the
// current word, even if selection potentially spans multiple
// text nodes.  If cursor is not *inside* a word (being on edge
// is not inside) then returns undefined.  Otherwise, returns
// the Range containing the current word.
function currentWord(editor): Range | undefined {
  const {selection} = editor;
  if (selection == null || !Range.isCollapsed(selection)) {
    return; // nothing to do -- no current word.
  }
  const { focus } = selection;
  const [node, path] = Editor.node(editor, focus);
  if (!Text.isText(node)) {
    // focus must be in a text node.
    return;
  }
  const { offset } = focus;
  const siblings: any[] = Node.parent(editor, path).children as any;

  // We move to the left from the cursor until leaving the current
  // word and to the right as well in order to find the
  // start and end of the current word.
  let start = { i: path[path.length - 1], offset };
  let end = { i: path[path.length - 1], offset };
  if (offset == siblings[start.i]?.text?.length) {
    // special case when starting at the right hand edge of text node.
    moveRight(start);
    moveRight(end);
  }
  const start0 = { ...start };
  const end0 = { ...end };

  function len(node): number {
    // being careful that there could be some non-text nodes in there, which
    // we just treat as length 0.
    return node?.text?.length ?? 0;
  }

  function charAt(pos: { i: number; offset: number }): string {
    const c = siblings[pos.i]?.text?.[pos.offset] ?? "";
    return c;
  }

  function moveLeft(pos: { i: number; offset: number }): boolean {
    if (pos.offset == 0) {
      pos.i -= 1;
      pos.offset = Math.max(0, len(siblings[pos.i]) - 1);
      return true;
    } else {
      pos.offset -= 1;
      return true;
    }
    return false;
  }

  function moveRight(pos: { i: number; offset: number }): boolean {
    if (pos.offset + 1 < len(siblings[pos.i])) {
      pos.offset += 1;
      return true;
    } else {
      if (pos.i + 1 < siblings.length) {
        pos.offset = 0;
        pos.i += 1;
        return true;
      } else {
        if (pos.offset < len(siblings[pos.i])) {
          pos.offset += 1; // end of the last block.
          return true;
        }
      }
    }
    return false;
  }

  while (charAt(start).match(/\w/) && moveLeft(start)) {}
  // move right 1.
  moveRight(start);
  while (charAt(end).match(/\w/) && moveRight(end)) {}
  if (isEqual(start, start0) || isEqual(end, end0)) {
    // if at least one endpoint doesn't change, cursor was not inside a word,
    // so we do not select.
    return;
  }

  const path0 = path.slice(0, path.length - 1);
  return {
    anchor: { path: path0.concat([start.i]), offset: start.offset },
    focus: { path: path0.concat([end.i]), offset: end.offset },
  };
}

AlexanderArvidsson commented 2 years ago

Any update on this?

I would really need to be able to choose which characters to include in a "word". In my case, I need to include underscores in the "word" in order to match emoji colon codes (i.e. raised_hands). Can we add options to include specific characters, like OP suggested?

{ unit: 'word', include: '-._', terminateOn: ' ' }

j0nas commented 2 years ago

First, I want to thank the maintainers of this library for providing the community with such a great piece of software. I've been working with Slate for some time now, and it is really good, covering 99% of my use-cases. Thank you for all your time and efforts! :heart:

Having become used to such a good experience, I'm surprised when I discover the remaining 1%. It seems strange to me that Transforms.select doesn't have an alternative signature that takes a unit, like @AlexanderArvidsson suggests above. The suggestions above, while solving the problem, are surprisingly complex for such a common use-case.

@williamstein Thank you for posting your solution here. I replaced the lodash isEqual line with the following:

  if ((start.i === start0.i && start.offset === start0.offset) ||
    (end.i === end0.i && end.offset === end0.offset)) {

And also wrote some simple tests for this, using slate-test-utils:

/** @jsx jsx */
import { assertOutput, buildTestHarness, testRunner } from "slate-test-utils";
import { Transforms } from "slate";
// noinspection ES6UnusedImports
import { jsx } from "./utils/testUtils";
import { currentWordRange } from "./utils";
import { Editor } from "./components/Editor";

const testCases = () => {
  describe(currentWordRange.name, () => {
    it("Returns range of word at cursor", async () => {
      const input = (
        <editor>
          <hp>A word or t<cursor />wo.</hp>
        </editor>
      );
      const [editor] = await buildTestHarness(Editor)({ editor: input });
      Transforms.select(editor, currentWordRange(editor));

      assertOutput(
        editor,
        <editor>
          <hp>A word or <anchor />two<focus />.</hp>
        </editor>
      );
    });

    it("Returns undefined if cursor not at a word", async () => {
      const input = (
        <editor>
          <hp>Lorem ipsum <cursor /> dolar sit amet</hp>
        </editor>
      );
      const [editor] = await buildTestHarness(Editor)({ editor: input });
      const range = currentWordRange(editor);
      expect(range).toBeUndefined();

      Transforms.select(editor, range);
      assertOutput(editor, input);
    });
  });
};

testRunner(testCases);

dylans commented 2 years ago

I'm surprised when I discover the remaining 1%

We're happy to consider PRs to fix the 1%.

AlexanderArvidsson commented 2 years ago

I ended up writing my own stepper which goes character by character and includes options as to which characters to include.

If anyone is interested, here it is. You may have to adjust typings. Credits to @williamstein for parts of it, but it works a little bit different according to my needs (character steps, instead of word steps). It also allows you to pass in a location instead. To adjust this to match the Transforms API, maybe use an "at" property instead. I would be happy to create a PR with this after modifying it to match the rest of the Transforms API.

export function word(
  editor: CustomEditor,
  location: Range,
  options: {
    terminator?: string[]
    include?: boolean
    directions?: 'both' | 'left' | 'right'
  } = {},
): Range | undefined {
  const { terminator = [' '], include = false, directions = 'both' } = options

  const { selection } = editor
  if (!selection) return

  // Get start and end, modify it as we move along.
  let [start, end] = Range.edges(location)

  let point: Point = start

  function move(direction: 'right' | 'left'): boolean {
    const next =
      direction === 'right'
        ? Editor.after(editor, point, {
            unit: 'character',
          })
        : Editor.before(editor, point, { unit: 'character' })

    const wordNext =
      next &&
      Editor.string(
        editor,
        direction === 'right' ? { anchor: point, focus: next } : { anchor: next, focus: point },
      )

    const last = wordNext && wordNext[direction === 'right' ? 0 : wordNext.length - 1]
    if (next && last && !terminator.includes(last)) {
      point = next

      if (point.offset === 0) {
        // Means we've wrapped to beginning of another block
        return false
      }
    } else {
      return false
    }

    return true
  }

  // Move point and update start & end ranges

  // Move forwards
  if (directions !== 'left') {
    point = end
    while (move('right'));
    end = point
  }

  // Move backwards
  if (directions !== 'right') {
    point = start
    while (move('left'));
    start = point
  }

  if (include) {
    return {
      anchor: Editor.before(editor, start, { unit: 'offset' }) ?? start,
      focus: Editor.after(editor, end, { unit: 'offset' }) ?? end,
    }
  }

  return { anchor: start, focus: end }
}

Include decides whether to include the terminator. Direction allows you to specify which directions to step in.

I have two use cases for this: Emojis and Mentions. You can see how to use it here:

Mentions:

        const range =
          beforeRange &&
          word(editor, beforeRange, {
            terminator: [' ', '@'],
            directions: 'left',
            include: true,
          })

Emojis:

        const beforeWordRange =
          beforeRange &&
          word(editor, beforeRange, { terminator: [' ', ':'], include: true, directions: 'left' })

tomliangg commented 2 years ago

I used slate for a small project last week and enjoyed it quite a bit at the beginning. But it bugged me when the selection "word" only consider english letters. I wrote a util function to get around the shortcomings. For my case, a word includes EN letters, numbers, and dashes (i.e. "hello-world-123"). Sharing my util function in case it can help others. I also have a sandbox to demonstrate the usage: https://codesandbox.io/s/slate-customize-word-f6vkbh

The idea is to first define a regular expression (a.k.a "regexp") for the word. Then use slate's Range.end(editor.selection) to get the current cursor position. Note the current cursor position. From current cursor and keep going left until the character doesn't match regexp. This can get us the left portion of the word. From current cursor and keep going right until the character doesn't match regexp. This can get us the right portion of the word.

Use an example: "sunny da|y" (I use a pipe sign | to denote the cursor, for this case, the cursor is between a and y). The left portion of the word is "da" and the right portion of the word is "y" so the whole word is "day".

https://user-images.githubusercontent.com/23287044/176298580-168d1f8e-ae7a-45b5-9c4e-1c5094ab6ee0.mov

// define word character as all EN letters, numbers, and dash
// change this regexp if you want other characters to be considered a part of a word
const wordRegexp = /[0-9a-zA-Z-]/;

const getLeftChar = (editor: ReactEditor, point: BasePoint) => {
  const end = Range.end(editor.selection as Range);
  return Editor.string(editor, {
    anchor: {
      path: end.path,
      offset: point.offset - 1
    },
    focus: {
      path: end.path,
      offset: point.offset
    }
  });
};

const getRightChar = (editor: ReactEditor, point: BasePoint) => {
  const end = Range.end(editor.selection as Range);
  return Editor.string(editor, {
    anchor: {
      path: end.path,
      offset: point.offset
    },
    focus: {
      path: end.path,
      offset: point.offset + 1
    }
  });
};

export const getCurrentWord = (editor: ReactEditor) => {
  const { selection } = editor; // selection is Range type

  if (selection) {
    const end = Range.end(selection); // end is a Point
    let currentWord = "";
    const currentPosition = cloneDeep(end);
    let startOffset = end.offset;
    let endOffset = end.offset;

    // go left from cursor until it finds the non-word character
    while (
      currentPosition.offset >= 0 &&
      getLeftChar(editor, currentPosition).match(wordRegexp)
    ) {
      currentWord = getLeftChar(editor, currentPosition) + currentWord;
      startOffset = currentPosition.offset - 1;
      currentPosition.offset--;
    }

    // go right from cursor until it finds the non-word character
    currentPosition.offset = end.offset;
    while (
      currentWord.length &&
      getRightChar(editor, currentPosition).match(wordRegexp)
    ) {
      currentWord += getRightChar(editor, currentPosition);
      endOffset = currentPosition.offset + 1;
      currentPosition.offset++;
    }

    const currentRange: Range = {
      anchor: {
        path: end.path,
        offset: startOffset
      },
      focus: {
        path: end.path,
        offset: endOffset
      }
    };

    return {
      currentWord,
      currentRange
    };
  }

  return {};
};

david-laurentino commented 2 years ago

@tomliangg thank you very much, it helped me a lot.

ldevai commented 1 year ago

@aliak00 I just wanted to thank you for this great solution which is not overly complicated. I stitched it together with another solution I found, and got the desired result, now I can properly detect words starting with $ or @.

const before = Editor.before(editor, start, { unit: 'character' })
const before2 = before && Editor.before(editor, start, { unit: 'word' })
const wordBefore = before2 && Editor.string(editor, { anchor: before2, focus: start })

jefrydco commented 1 month ago

Thanks @tomliangg for providing the Codesandbox link, I modified your version to

✅ Make it work to get nth of previous word, somehow in your codesandbox link the function to get nth of previous word doesn't work properly ✅ Add get next word ✅ Add get nth of next word

Codesandbox: Slate Get Word, Previous, After, Nth Previous and Nth After Under Cursor