zufuliu / notepad4

Notepad4 (Notepad2⨯2, Notepad2++) is a light-weight Scintilla based text editor for Windows with syntax highlighting, code folding, auto-completion and API list for many programming languages and documents, bundled with file browser plugin matepath.
Other
2.56k stars 179 forks source link

Boost regex engine #725

Open zufuliu opened 10 months ago

zufuliu commented 10 months ago

See PR #722, @atauzki 👍 is working on integrating Boost regex , after the changes are merged most if not all regex issues should be fixed.

At the end our code will have three regex engines: defined preprocessors regex engine
BOOST_REGEX_STANDALONE Scintilla's simple POSIX regex plus Boost regex
NO_CXX11_REGEX Scintilla's simple POSIX regex (current build configuration)
none Scintilla's simple POSIX regex plus C++ STL std::regex
zufuliu commented 10 months ago

Some TODOs:

atauzki commented 10 months ago

Another suggestion: zero-width match's hints should be improved like Notepad3:

image

zufuliu commented 10 months ago

following is some performance test results (match count and time in millisecond) for attached JSON file (produced by expand.py in the zip for Visual Studio 2022 instalation catalog.json) with commit 38be0ce92e5106cf0c6f49ed1f6864795121d042. As such I'm going to remove SCI_OWNREGEX build configuration (still needs time to improve the speed). re-test-1015.zip

regex RESearch std::wregex std::regex boost::wregex boost::regex
\w+ 1434315, 315 1436523, 7636 1423835, 4372 1436523, 2035 1501396, 800
[a-zA-Z0-9_]+ 1423835, 331 1423835, 7654 1423835, 4386 1423835, 2855 1423835, 777
\d+ 1028016, 280 1028016, 6470 1028016, 6475 1028016, 2050 1028016, 739
[0-9]+ 1028016, 286 1028016, 6475 1028016, 6218 1028016, 2044 1028016,725
\s+ 895401, 252 895945, 6151 895403, 5972 895917, 1883 911355, 662
[ \t]+ 895401, 254 895401, 6375 895401, 6200 895401, 2935 895401, 678
^[ \t]+ 440216, 92 440216, 846 440216, 724 440216, 465 440216, 234
[ \t]+$ 0, 154 0, 6492 0, 6324 0, 575 0, 84
lenny20 commented 7 months ago

今天发布的版本有没有包含Boost regex ??我看替换对话框没啥变化哦。

zufuliu commented 7 months ago

今天发布的版本有没有包含Boost regex

Just download latest builds from boost regex branch, e.g. https://github.com/zufuliu/notepad2/actions/runs/7517811166

zufuliu commented 4 months ago

Win32 build with boost::regex (depends on SleepConditionVariableSRW() and WakeAllConditionVariable()) or std::regex (depends on InitializeCriticalSectionEx()) doesn't run on XP.

vvyoko commented 4 months ago

请问下boost::regex是不支持匹配\pP这种属性匹配吗 更多测试属性 正则表达式-匹配标点符号

另外匹配的(pattern)目前是用 \1,\2引用 将来会考虑用常用的$1,$2代替吗

zufuliu commented 4 months ago

Seems not supported (requires ICU). https://www.boost.org/doc/libs/1_85_0/libs/regex/doc/html/boost_regex/unicode.html https://www.boost.org/doc/libs/1_85_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.character_properties

atauzki commented 4 months ago

另外匹配的(pattern)目前是用 \1,\2引用 将来会考虑用常用的$1,$2代替吗

boost本身支持,但是现在的代码没有用这个实现,只加了个TODO注释

zufuliu commented 3 months ago

Win32 build with boost::regex (depends on SleepConditionVariableSRW() and WakeAllConditionVariable()) or std::regex (depends on InitializeCriticalSectionEx()) doesn't run on XP.

This can be "fixed" by disabling thread-safe local static initialization with /Zc:threadSafeInit-: https://learn.microsoft.com/en-us/cpp/build/reference/zc-threadsafeinit-thread-safe-local-static-initialization?view=msvc-170

The implementation of this feature relies on Windows operating system support functions in Windows Vista and later operating systems.

atauzki commented 3 months ago

Another bug related to boost regex search: if execute a zero-width match (eg: ^, $, \b) searching next/previous for multiple times, it just stucks at its original place from the second time. Emeditor also has this bug but Notepad3 doesn't, I had no good idea working on this.

atauzki commented 3 months ago

请问下boost::regex是不支持匹配\pP这种属性匹配吗 更多测试属性 正则表达式-匹配标点符号

libICU编译出来至少20-30M吧,代价太大。要支持这个功能可以用PCRE2,就看有没有这个计划了 图片

zufuliu commented 3 months ago

libICU编译出来至少20-30M吧,代价太大。

Maybe dynamic load ICU (which is available on Win10+), https://learn.microsoft.com/en-us/windows/win32/intl/international-components-for-unicode--icu-

atauzki commented 3 months ago

libICU编译出来至少20-30M吧,代价太大。

Maybe dynamic load ICU (which is available on Win10+), https://learn.microsoft.com/en-us/windows/win32/intl/international-components-for-unicode--icu-

it doesn't have icu namespace in it's icu.h, but boost uses icu's c++ api. And no C++ symbol exported in icu.dll.

image

zufuliu commented 2 weeks ago

boost_regex branch is merged into main (still not set as default engine due to the slower speed), here are new strings (added by 98f9eb26e38fd6ebb0446a4adf22fdccb57a69c4 and 59366c691dc6a1ebd5b1772748517bccd159f0e3) need to be translated, cc @Matteo-Nigro, @maboroshin, @VenusGirl.

image

zufuliu commented 2 weeks ago

707e258bd3164ab69c9d37bbcc9b83d585368321 made a small change to strings on Find/Replace dialog, should not affect existing translations: image