Closed matklad closed 1 year ago
Agreed that it's a bug. This one is due to the fact that Lua's built-in pattern syntax is not unicode-aware. We use something like
local lastwordpos = string.find(prevnode.s, "%w+$")
to figure out where the last word starts, but this breaks on the Cyrillic characters. It's possible to do better here, it will just add some complexity to this part of the code.
By the way, an easier behavior to implement would be to put attributes on the previous node. This would mean that in
*six* blue dogs{.foo}
the foo class gets attached to " blue dogs"
rather than just "dogs"
. This change would remove the need to detect "words," which is actually quite hard without proper unicode libraries. However, it is much more intuitive to the reader/writer that the {.foo}
would just attach to "dogs"
.
A simpler idea would be that the attribute applies to the last run of consecutive non-space characters (which non-space is defined in the ASCII way).
Yeah, but then ?{.heh}
would be valid, and somewhat surprising, syntax. There's also pitfall around ascii whitespace, in that not all ws is ascii: https://doc.rust-lang.org/reference/whitespace.html
In particular, ltr / rtl marks feel like something which might have some interactions here, but maybe not.
Overall, my gut feeling is that attributes on words is a rather niche use-case, and asking the user to type [dogs]{.foo}
isn't much of a burden.
What would be wrong with ?{.heh}
?
I know that there is non-ascii whitespace, but we could ignore that for purposes of this feature.
I'm not sure how this would work with RTL languages.
I guess you're suggesting removing attributes on bare words. But then, what should happen when someone writes foo{.bar}
?
What would be wrong with
?{.heh}
?
Nothing, really, but it does look like an oddity. Can only object this on aesthetics grounds :) I guess what makes me uneasy is that {}
becomes a greedy syntax which always applies, which makes it harder to detect typos and invalid syntax.
I guess you're suggesting removing attributes on bare words
Yup. As I mentioned in the other issue, I think this would also perhaps allow us to get rid of a leading .
for attrs.
But then, what should happen when someone writes foo{.bar}?
The same as for a space here-> {.bar}
. What exactly that should be I am unsure. It looks like today the parser basically eats the attribute. I think it's better to interpret just as text?
I think both cases should either apply or not apply the attribute