Open yutkat opened 1 year ago
I've just experiencing the same issue. It seems that the regular expression in [2] of the default settings is causing the problem:
-- Matches: a URL in parens: (URL)
{
regex = '\\((\\w+://\\S+)\\)', -- [1]
format = '$1',
highlight = 1,
},
-- ...(snipped)...
-- Then handle URLs not wrapped in brackets
{
regex = '\\b\\w+://\\S+[)/a-zA-Z0-9-]+', -- [2]
format = '$0',
},
The closing brace is included in the pattern, causing it to take precedence over the [1] pattern. I fixed it in my .wezterm.lua file as follows, and it seems to be working correctly:
-- Then handle URLs not wrapped in brackets
{
regex = '[^(]\\b(\\w+://\\S+[)/a-zA-Z0-9-]+)',
format = '$1',
highlight = 1,
},
I tested it with the following text to verify:
- (http://localhost/foo)
- [http://localhost/foo]
- {http://localhost/foo}
- <http://localhost/foo>
- http://localhost/foo
- http://localhost/foo)
(though I don't know why a closing brace is including the [2] pattern.)
.wezterm.lua for workaround:
local wezterm = require 'wezterm'
local config = {}
config.hyperlink_rules = {
-- Matches: a URL in parens: (URL)
{
regex = '\\((\\w+://\\S+)\\)',
format = '$1',
highlight = 1,
},
-- Matches: a URL in brackets: [URL]
{
regex = '\\[(\\w+://\\S+)\\]',
format = '$1',
highlight = 1,
},
-- Matches: a URL in curly braces: {URL}
{
regex = '\\{(\\w+://\\S+)\\}',
format = '$1',
highlight = 1,
},
-- Matches: a URL in angle brackets: <URL>
{
regex = '<(\\w+://\\S+)>',
format = '$1',
highlight = 1,
},
-- Then handle URLs not wrapped in brackets
{
-- Before
--regex = '\\b\\w+://\\S+[)/a-zA-Z0-9-]+',
--format = '$0',
-- After
regex = '[^(]\\b(\\w+://\\S+[)/a-zA-Z0-9-]+)',
format = '$1',
highlight = 1,
},
-- implicit mailto link
{
regex = '\\b\\w+@[\\w-]+(\\.[\\w-]+)+\\b',
format = 'mailto:$0',
},
}
return config
@nobeans have you thought about creating a PR to fix default hyperlink_rules
? Or is there a situation where this is not desirable behavior?
I don't understand why a closing parenthesis is included in the original regular expression. Therefore, I'm not confident whether the fix is correct. For instance, an alternative solution like the following could also be considered:
...
-- Then handle URLs not wrapped in brackets
{
-- Before
--regex = '\\b\\w+://\\S+[)/a-zA-Z0-9-]+',
--format = '$0',
-- After(2)
regex = '\\b\\w+://\\S+[/a-zA-Z0-9-]+', -- just removed ')'
format = '$0',
},
...
Because of this, I'm hesitating to send the pull request.
This may be just a little bit harder than it seems at first sight, consider the following:
[unix-history-repo](https://github.com/dspinellis/unix-history-repo)
[*Das verrückte Labyrinth*](https://en.wikipedia.org/wiki/Labyrinth_(board_game))
https://en.wikipedia.org/wiki/C_(programming_language)
http://en.wikipedia.org/w/index.php?title=Thresholding_(image_processing)&oldid=132306976
https://en.wikipedia.org/wiki/ANSI_escape_code#CSI_(Control_Sequence_Introducer)_sequences
I see. There are patterns where parentheses are included within a URL. Anyway, both of the suggested fixes, commented as "After" or "After(2)," seem to work well for the URLs.
Oops. The "After(2)" fix doesn't work for this:
https://en.wikipedia.org/wiki/C_(programming_language)
The fix "After" works well.
The default rules seem to need to use non-greedy mode. Please refer to the example in this repository's README: an image with a link:
* [![Sponsor WezTerm](https://img.shields.io/github/sponsors/wez?label=Sponsor%20WezTerm&logo=github&style=for-the-badge)](https://github.com/sponsors/wez)
expect two links:
https://img.shields.io/github/sponsors/wez?label=Sponsor%20WezTerm&logo=github&style=for-the-badge
and
https://github.com/sponsors/wez
actual one link:
https://img.shields.io/github/sponsors/wez?label=Sponsor%20WezTerm&logo=github&style=for-the-badge)](https://github.com/sponsors/wez)
non-greedy mode: +?
, *?
:
config.hyperlink_rules = {
-- Matches: a URL in parens: (URL)
-- Markdown: [text](URL title)
{
regex = '\\((\\w+://\\S+?)(\\s+.+)?\\)',
format = '$1',
highlight = 1,
},
-- Matches: a URL in brackets: [URL]
{
regex = '\\[(\\w+://\\S+?)\\]',
format = '$1',
highlight = 1,
},
-- Matches: a URL in curly braces: {URL}
{
regex = '\\{(\\w+://\\S+?)\\}',
format = '$1',
highlight = 1,
},
-- Matches: a URL in angle brackets: <URL>
{
regex = '<(\\w+://\\S+?)>',
format = '$1',
highlight = 1,
},
-- Then handle URLs not wrapped in brackets
-- regex = '\\b\\w+://\\S+[)/a-zA-Z0-9-]+',
{
regex = '(?<![\\(\\{\\[])\\b\\w+://\\S+\\b',
format = '$0',
},
-- implicit mailto link
{
regex = '\\b\\w+@[\\w-]+(\\.[\\w-]+)+\\b',
format = 'mailto:$0',
},
}
The above rules are not perfect, and the following test cases are not perfectly recognized.
1. [![Sponsor WezTerm](https://img.shields.io/github/sponsors/wez?label=Sponsor%20WezTerm&logo=github&style=for-the-badge)](https://github.com/sponsors/wez)
2. [![C_(programming_language)](https://img.shields.io/github/sponsors/wez?label=Sponsor%20WezTerm&logo=github&style=for-the-badge)](https://en.wikipedia.org/wiki/C_(programming_language))
3. [C_(programming_language)](https://en.wikipedia.org/wiki/C_(programming_language))
4. (https://en.wikipedia.org/wiki/C_(programming_language))
5. [https://en.wikipedia.org/wiki/C_(programming_language)]
6. {https://en.wikipedia.org/wiki/C_(programming_language)}
7. <https://en.wikipedia.org/wiki/C_(programming_language)>
8. [Duck Duck Go](https://duckduckgo.com "search engine")
9. https://en.wikipedia.org/wiki/C_(programming_language)
item 1
needs non-greedy mode, but item 3
needs greedy mode.
When multiple rules match, WezTerm chooses the longest result (greedy). Under the current behavior, it seems irreconcilable.
[IMO] I think it should match according to the order of the rules, giving users the opportunity to adjust the matching rules. More precise rules should take precedence over more general ones.
@wez
https://en.wikipedia.org/wiki/C_(programming_language)
is also incorrectly recognized as https://en.wikipedia.org/wiki/C_(programming_language
in iTem2
.
Unless the parentheses in the URL are URL-encoded (%hex), it is difficult to accurately recognize the links whether using greedy or non-greedy mode. Given that other terminals also cannot recognize parentheses in links, I believe we should use non-greedy mode, as it produces more accurate links.
This is my final configuration:
config.hyperlink_rules = {
-- Matches: a URL in parens: (URL)
-- Markdown: [text](URL title)
{
regex = '\\((\\w+://\\S+?)(?:\\s+.+)?\\)',
format = '$1',
highlight = 1,
},
-- Matches: a URL in brackets: [URL]
{
regex = '\\[(\\w+://\\S+?)\\]',
format = '$1',
highlight = 1,
},
-- Matches: a URL in curly braces: {URL}
{
regex = '\\{(\\w+://\\S+?)\\}',
format = '$1',
highlight = 1,
},
-- Matches: a URL in angle brackets: <URL>
{
regex = '<(\\w+://\\S+?)>',
format = '$1',
highlight = 1,
},
-- Then handle URLs not wrapped in brackets
-- regex = '\\b\\w+://\\S+[)/a-zA-Z0-9-]+',
{
regex = '(?<![\\(\\{\\[<])\\b\\w+://\\S+',
format = '$0',
},
-- implicit mailto link
{
regex = '\\b\\w+@[\\w-]+(\\.[\\w-]+)+\\b',
format = 'mailto:$0',
},
}
@kenchou your (?<![\\(\\{\\[<])\\b\\w+://\\S+
regex contains a lookbehind, which is not supported by Wezterm's regular expression engine, which is also Rust's regular expression engine.
(?<![\\(\\{\\[<])\\b\\w+://\\S+
regex contains a lookbehind, which is not supported by Wezterm's regular expression engine, which is also Rust's regular expression engine.
See the configuration in my comment, these are the rules I am currently using. WezTerm uses fancy-regex
, which supports lookaround
. I am using the macOS version, and this rule works as I expected. Maybe the behavior is different on the Windows platform?
@hankertrix
WezTerm uses
fancy-regex
, which supportslookaround
.
Ah I see, looks like I didn't read the docs closely enough. I looked at the top of the hyperlink_rules
page and just clicked on the Regex syntax link, which didn't include lookarounds.
In that case, I think using your regex for hyperlinks in https://github.com/wez/wezterm/pull/4212 should work well enough.
What Operating System(s) are you seeing this problem on?
Linux Wayland
Which Wayland compositor or X11 Window manager(s) are you using?
Sway
WezTerm version
20230529-192946-62c40118
Did you try the latest nightly build to see if the issue is better (or worse!) than your current version?
Yes, and I updated the version box above to show the version of the nightly that I tried
Describe the bug
If the link is Markdown format, I cannot jump to the site by hyperlink. Why is a
)
included here?https://github.com/wez/wezterm/blob/c6f4ff3626f66f5bc1c64978faf3dbf1e3b6106c/config/src/config.rs#L1619
Or the following rule should be applied first.
To Reproduce
No response
Configuration
no config
Expected Behavior
This format will correctly link to
https://github.com
.Now it is linked to
https://github.com%29
.Logs
No response
Anything else?
No response