sawhney17 / logseq-automatic-linker

MIT License
87 stars 16 forks source link

test(chinese): Attempt at adding tests for the chinese regex match #73

Open markscamilleri opened 5 months ago

markscamilleri commented 5 months ago

Problem

I noticed while testing #72 that this piece of code:

if (page.match(/^[\u4e00-\u9fa5]{0,}$/gm)) {
  content = content.replaceAll(
    chineseRegex,
    parseAsTags ? `#${page}` : `[[${page}]]`
  );
  needsUpdate = true;
}

was not actually tested anywhere.

Solution

This is my attempt at adding a couple of tests for this secton of code.

Apologies and Disclaimer

However, I am not a native Chinese, Japanese, Korean or Vietnamese speaker, and this was a best guess based on the official unicode table and the Cabridge English <-> Chinese (Simplified) dictionary, so if this is not right for any reason, please feel free to edit the PR and/or feedback here please! The aim here is to avoid any regression from getting introduced in the future.

Question

I also noticed that the unicode tables go all the way to \u9fff for CJKV characters/ideographs. Should we expand the scope of the chineseRegex to match this?