Open aliak00 opened 3 years ago
For integration of Slate into my product, I also had to write a ridiculously complicated function to get the current word, and I expect many other people have done so as well. I'll sure mine too, in case anybody finds it helpful when they tackle this issue. This isEqual below is from lodash.
// Expand collapsed selection to range containing exactly the
// current word, even if selection potentially spans multiple
// text nodes. If cursor is not *inside* a word (being on edge
// is not inside) then returns undefined. Otherwise, returns
// the Range containing the current word.
function currentWord(editor): Range | undefined {
const {selection} = editor;
if (selection == null || !Range.isCollapsed(selection)) {
return; // nothing to do -- no current word.
}
const { focus } = selection;
const [node, path] = Editor.node(editor, focus);
if (!Text.isText(node)) {
// focus must be in a text node.
return;
}
const { offset } = focus;
const siblings: any[] = Node.parent(editor, path).children as any;
// We move to the left from the cursor until leaving the current
// word and to the right as well in order to find the
// start and end of the current word.
let start = { i: path[path.length - 1], offset };
let end = { i: path[path.length - 1], offset };
if (offset == siblings[start.i]?.text?.length) {
// special case when starting at the right hand edge of text node.
moveRight(start);
moveRight(end);
}
const start0 = { ...start };
const end0 = { ...end };
function len(node): number {
// being careful that there could be some non-text nodes in there, which
// we just treat as length 0.
return node?.text?.length ?? 0;
}
function charAt(pos: { i: number; offset: number }): string {
const c = siblings[pos.i]?.text?.[pos.offset] ?? "";
return c;
}
function moveLeft(pos: { i: number; offset: number }): boolean {
if (pos.offset == 0) {
pos.i -= 1;
pos.offset = Math.max(0, len(siblings[pos.i]) - 1);
return true;
} else {
pos.offset -= 1;
return true;
}
return false;
}
function moveRight(pos: { i: number; offset: number }): boolean {
if (pos.offset + 1 < len(siblings[pos.i])) {
pos.offset += 1;
return true;
} else {
if (pos.i + 1 < siblings.length) {
pos.offset = 0;
pos.i += 1;
return true;
} else {
if (pos.offset < len(siblings[pos.i])) {
pos.offset += 1; // end of the last block.
return true;
}
}
}
return false;
}
while (charAt(start).match(/\w/) && moveLeft(start)) {}
// move right 1.
moveRight(start);
while (charAt(end).match(/\w/) && moveRight(end)) {}
if (isEqual(start, start0) || isEqual(end, end0)) {
// if at least one endpoint doesn't change, cursor was not inside a word,
// so we do not select.
return;
}
const path0 = path.slice(0, path.length - 1);
return {
anchor: { path: path0.concat([start.i]), offset: start.offset },
focus: { path: path0.concat([end.i]), offset: end.offset },
};
}
Any update on this?
I would really need to be able to choose which characters to include in a "word".
In my case, I need to include underscores in the "word" in order to match emoji colon codes (i.e. raised_hands
).
Can we add options to include specific characters, like OP suggested?
{ unit: 'word', include: '-._', terminateOn: ' ' }
First, I want to thank the maintainers of this library for providing the community with such a great piece of software. I've been working with Slate for some time now, and it is really good, covering 99% of my use-cases. Thank you for all your time and efforts! :heart:
Having become used to such a good experience, I'm surprised when I discover the remaining 1%. It seems strange to me that Transforms.select
doesn't have an alternative signature that takes a unit
, like @AlexanderArvidsson suggests above. The suggestions above, while solving the problem, are surprisingly complex for such a common use-case.
@williamstein Thank you for posting your solution here.
I replaced the lodash isEqual
line with the following:
if ((start.i === start0.i && start.offset === start0.offset) ||
(end.i === end0.i && end.offset === end0.offset)) {
And also wrote some simple tests for this, using slate-test-utils:
/** @jsx jsx */
import { assertOutput, buildTestHarness, testRunner } from "slate-test-utils";
import { Transforms } from "slate";
// noinspection ES6UnusedImports
import { jsx } from "./utils/testUtils";
import { currentWordRange } from "./utils";
import { Editor } from "./components/Editor";
const testCases = () => {
describe(currentWordRange.name, () => {
it("Returns range of word at cursor", async () => {
const input = (
<editor>
<hp>A word or t<cursor />wo.</hp>
</editor>
);
const [editor] = await buildTestHarness(Editor)({ editor: input });
Transforms.select(editor, currentWordRange(editor));
assertOutput(
editor,
<editor>
<hp>A word or <anchor />two<focus />.</hp>
</editor>
);
});
it("Returns undefined if cursor not at a word", async () => {
const input = (
<editor>
<hp>Lorem ipsum <cursor /> dolar sit amet</hp>
</editor>
);
const [editor] = await buildTestHarness(Editor)({ editor: input });
const range = currentWordRange(editor);
expect(range).toBeUndefined();
Transforms.select(editor, range);
assertOutput(editor, input);
});
});
};
testRunner(testCases);
I'm surprised when I discover the remaining 1%
We're happy to consider PRs to fix the 1%.
I ended up writing my own stepper which goes character by character and includes options as to which characters to include.
If anyone is interested, here it is. You may have to adjust typings. Credits to @williamstein for parts of it, but it works a little bit different according to my needs (character steps, instead of word steps). It also allows you to pass in a location instead. To adjust this to match the Transforms API, maybe use an "at" property instead. I would be happy to create a PR with this after modifying it to match the rest of the Transforms API.
export function word(
editor: CustomEditor,
location: Range,
options: {
terminator?: string[]
include?: boolean
directions?: 'both' | 'left' | 'right'
} = {},
): Range | undefined {
const { terminator = [' '], include = false, directions = 'both' } = options
const { selection } = editor
if (!selection) return
// Get start and end, modify it as we move along.
let [start, end] = Range.edges(location)
let point: Point = start
function move(direction: 'right' | 'left'): boolean {
const next =
direction === 'right'
? Editor.after(editor, point, {
unit: 'character',
})
: Editor.before(editor, point, { unit: 'character' })
const wordNext =
next &&
Editor.string(
editor,
direction === 'right' ? { anchor: point, focus: next } : { anchor: next, focus: point },
)
const last = wordNext && wordNext[direction === 'right' ? 0 : wordNext.length - 1]
if (next && last && !terminator.includes(last)) {
point = next
if (point.offset === 0) {
// Means we've wrapped to beginning of another block
return false
}
} else {
return false
}
return true
}
// Move point and update start & end ranges
// Move forwards
if (directions !== 'left') {
point = end
while (move('right'));
end = point
}
// Move backwards
if (directions !== 'right') {
point = start
while (move('left'));
start = point
}
if (include) {
return {
anchor: Editor.before(editor, start, { unit: 'offset' }) ?? start,
focus: Editor.after(editor, end, { unit: 'offset' }) ?? end,
}
}
return { anchor: start, focus: end }
}
Include decides whether to include the terminator. Direction allows you to specify which directions to step in.
I have two use cases for this: Emojis and Mentions. You can see how to use it here:
Mentions:
const range =
beforeRange &&
word(editor, beforeRange, {
terminator: [' ', '@'],
directions: 'left',
include: true,
})
Emojis:
const beforeWordRange =
beforeRange &&
word(editor, beforeRange, { terminator: [' ', ':'], include: true, directions: 'left' })
I used slate
for a small project last week and enjoyed it quite a bit at the beginning. But it bugged me when the selection "word" only consider english letters. I wrote a util function to get around the shortcomings. For my case, a word includes EN letters, numbers, and dashes (i.e. "hello-world-123"). Sharing my util function in case it can help others. I also have a sandbox to demonstrate the usage: https://codesandbox.io/s/slate-customize-word-f6vkbh
The idea is to first define a regular expression (a.k.a "regexp") for the word. Then use slate's Range.end(editor.selection)
to get the current cursor position. Note the current cursor position. From current cursor and keep going left until the character doesn't match regexp. This can get us the left portion of the word. From current cursor and keep going right until the character doesn't match regexp. This can get us the right portion of the word.
Use an example: "sunny da|y" (I use a pipe sign | to denote the cursor, for this case, the cursor is between a and y). The left portion of the word is "da" and the right portion of the word is "y" so the whole word is "day".
// define word character as all EN letters, numbers, and dash
// change this regexp if you want other characters to be considered a part of a word
const wordRegexp = /[0-9a-zA-Z-]/;
const getLeftChar = (editor: ReactEditor, point: BasePoint) => {
const end = Range.end(editor.selection as Range);
return Editor.string(editor, {
anchor: {
path: end.path,
offset: point.offset - 1
},
focus: {
path: end.path,
offset: point.offset
}
});
};
const getRightChar = (editor: ReactEditor, point: BasePoint) => {
const end = Range.end(editor.selection as Range);
return Editor.string(editor, {
anchor: {
path: end.path,
offset: point.offset
},
focus: {
path: end.path,
offset: point.offset + 1
}
});
};
export const getCurrentWord = (editor: ReactEditor) => {
const { selection } = editor; // selection is Range type
if (selection) {
const end = Range.end(selection); // end is a Point
let currentWord = "";
const currentPosition = cloneDeep(end);
let startOffset = end.offset;
let endOffset = end.offset;
// go left from cursor until it finds the non-word character
while (
currentPosition.offset >= 0 &&
getLeftChar(editor, currentPosition).match(wordRegexp)
) {
currentWord = getLeftChar(editor, currentPosition) + currentWord;
startOffset = currentPosition.offset - 1;
currentPosition.offset--;
}
// go right from cursor until it finds the non-word character
currentPosition.offset = end.offset;
while (
currentWord.length &&
getRightChar(editor, currentPosition).match(wordRegexp)
) {
currentWord += getRightChar(editor, currentPosition);
endOffset = currentPosition.offset + 1;
currentPosition.offset++;
}
const currentRange: Range = {
anchor: {
path: end.path,
offset: startOffset
},
focus: {
path: end.path,
offset: endOffset
}
};
return {
currentWord,
currentRange
};
}
return {};
};
@tomliangg thank you very much, it helped me a lot.
@aliak00 I just wanted to thank you for this great solution which is not overly complicated. I stitched it together with another solution I found, and got the desired result, now I can properly detect words starting with $ or @.
const before = Editor.before(editor, start, { unit: 'character' })
const before2 = before && Editor.before(editor, start, { unit: 'word' })
const wordBefore = before2 && Editor.string(editor, { anchor: before2, focus: start })
Thanks @tomliangg for providing the Codesandbox link, I modified your version to
✅ Make it work to get nth of previous word, somehow in your codesandbox link the function to get nth of previous word doesn't work properly ✅ Add get next word ✅ Add get nth of next word
Codesandbox: Slate Get Word, Previous, After, Nth Previous and Nth After Under Cursor
Problem The problem is that I want to be able to get the word under the cursor (collapsed) and the range of that word within a block element. The problem is that slate's
Editor.blah
functions don't seem sufficient to do it without some crazy logic.For my use-case a "word" includes the dash and dot (
-
,.
) characters.I'll use '|' as cursor location. If you have 'hello| world' and call
Editor.after
with the word unit, you'll get the point after world. If you have 'hello world|' and you callEditor.after
with the word unit, you'll get the first point in the next block. The same applies to Editor.afterSo to actually get the word under the cursor, this is the logic I have:
And then I have my word and range:
Solution A solution would be to not include "space" as part of word boundaries. Or someway for me to tell the
Editor.before/after
APIs to use the word unit but include specific characters and use other characters as terminations: e.g.Or to allow
{ edge: 'end' }
in the options so that it doesn't pass the end of the block?Context Here's a screen shot of a slack thread that has more details: