Closed fischerling closed 9 months ago
I suggest using something like sed -E 's/([^[:alnum:]_])/\n/g'
instead which seams to work in my quick experiments.
I will use this for a while and report back.
The sed version I use is: sed (GNU sed) 4.9
.
Sorry about that, I should have noticed that -c [:alnum:]
and
[:blank:]
are not even close to the same thing. I can probably revert
that commit for now since vis is primarily a source code editor and it
makes more sense to work in that context first.
I suggest using something like sed -E 's/([^[:alnum:]_])/\n/g' instead
That doesn't seem to work. Since the only non-latin scripts I'm familiar with are Chinese & Japanese characters which don't work in either version I found this polish text online:
Jest dostępnych wiele różnych wersji Lorem Ipsum, ale większość
zmieniła się pod wpływem dodanego humoru czy przypadkowych słów,
które nawet w najmniejszym stopniu nie przypominają istniejących.
If you try completing some words from there you should see that your method doesn't work.
That doesn't seem to work. Since the only non-latin scripts I'm familiar with are Chinese & Japanese characters which don't work in either version I found this [polish text][] online:
Jest dostępnych wiele różnych wersji Lorem Ipsum, ale większość zmieniła się pod wpływem dodanego humoru czy przypadkowych słów, które nawet w najmniejszym stopniu nie przypominają istniejących.
If you try completing some words from there you should see that your method doesn't work.
What sed version are you using? Completing words in the polish text you provided works just fine on my system.
What sed version are you using?
sbase sed. I tried with GNU sed and it does work as you suggested. I checked and GNU implements their own regex library whereas sbase just uses the standard POSIX regex library.
Randy Palamar @.***> wrote:
Sorry about that, I should have noticed that
-c [:alnum:]
and[:blank:]
are not even close to the same thing. I can probably revert that commit for now since vis is primarily a source code editor and it makes more sense to work in that context first.
My bad! I wasn't thinking about the source code completion case.
I suggest using something like sed -E 's/([^[:alnum:]_])/\n/g' instead
How about something like this tr -s '[:blank:]{}()[]_' '\n'
? I think
that should cover the programming use case as well (though I did test
this only lightly).
Cheers, Silvan
My bad! I wasn't thinking about the source code completion case.
But this does not even work for normal punctuation in English text.
You can not complete the last word in a sentence ended by punctuation (e.g .
or ?
).
I suggest using something like sed -E 's/([^[:alnum:]])/\n/g' instead How about something like this `tr -s '[:blank:]{}()[]' '\n'` ? I think that should cover the programming use case as well (though I did test this only lightly). Cheers, Silvan
We would need a lot of special characters in our regex, for example ',' is definitely missing as well.
Or imagine writing XML or HTML code where we would need <
and >
symbols.
Using a multi-byte encoding negative match of characters allowed in identifiers,
[:alnum:]_
, is probably the only really universally working solution.
Maybe we can detect GNU sed or make the command somehow configurable.
Or in the worst case we could implement a simple portable binary, doing exactly what we need, splitting multi-byte encoded text by [:alnum:]_
. Supposed we find some one willing to actually do it.
Something as simple as the following C program is probably enough.
#include <err.h>
#include <errno.h>
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <wctype.h>
int main() {
wint_t c;
setlocale(LC_ALL, "");
errno = 0;
while((c=fgetwc(stdin)) != WEOF) {
if (!(iswalnum(c) || c == btowc('_')))
c = btowc('\n');
wint_t res = fputwc(c, stdout);
if (res == WEOF)
err(EXIT_FAILURE, "%s", "Failed to write to stdout");
}
if (errno)
err(EXIT_FAILURE, "%s", "Failed to read from stdin");
}
fischerling @.***> wrote:
My bad! I wasn't thinking about the source code completion case.
But this does not even work for normal punctuation in English text. You can not complete the last word in a sentence ended by punctuation (e.g
.
or?
).
Yes, I cannot say that this was well tested. Not enough dog fooding :/
I suggest using something like sed -E 's/([^[:alnum:]])/\n/g' instead How about something like this `tr -s '[:blank:]{}()[]' '\n'` ? I think that should cover the programming use case as well (though I did test this only lightly). Cheers, Silvan
We would need a lot of special characters in our regex, for example ',' is definitely missing as well.
Or imagine writing XML or HTML code where we would need
<
and>
symbols.
I don't think adding a lot of special characters to the pattern is an issue. If we think we can restore the functionality by adding a few more characters (like ',<>%^&.' and maybe some other ones) to that set, I think it would be an easy solution.
This wouldn't solve the issue of not being able to pass multibyte
sequences to GNU tr
which means we couldn't use Unicode code points
(like Chinese/Japanese punctuation, i.e. 。,
) there. Apparently GNU
sed would support those though so we could use that if we decide to go
that route.
Cheers, Silvan
I would like to avoid making vis-complete
dependent on the GNU
versions of tools. I think I prefer tr
with some hand picked character
set that is good enough. Its probably better to avoid the terrible
{w,}ctype(3)
classes altogether as well. Something like
tr -s '\t {}()[],<>%^&.' '\n'
is probably good enough for most uses.
Edit: Sorry, _
shouldn't be in the set.
Randy Palamar @.***> wrote:
I would like to avoid making
vis-complete
dependent on the GNU versions of tools. I think I prefertr
with some hand picked character set that is good enough. Its probably better to avoid the terrible{w,}ctype(3)
classes altogether as well. Something liketr -s '\t {}()[]_,<>%^&.' '\n'
is probably good enough for most uses.
Yes, I agree. It doesn't cover the unicode use case but it should take us quite far for the programming one.
Should I send a patch?
Should I send a patch?
Sure, if you want. I was mostly just waiting to see if this fixed the issue for both of you. Just a small thing since you replied to the email; it should be:
tr -s '\t {}()[],<>%^&.' '\n'
without the _
. This way it completes whole snake case function names
as they are used in vis. Its also inline with the old behaviour.
Should I send a patch?
Sure, if you want. I was mostly just waiting to see if this fixed the issue for both of you. Just a small thing since you replied to the email; it should be:
tr -s '\t {}()[],<>%^&.' '\n'
without the
_
. This way it completes whole snake case function names as they are used in vis. Its also inline with the old behaviour.
This is definitely way better than using [:blank:]
. I used it for the last day[s] and it seams to work so far.
However I could still locally patch it to use GNU sed so feel free to update the regex.
Since applying https://github.com/martanne/vis/pull/946/commits/711447afd4a7857d1ecb173893009924b07d4993 source code word completion is broken.
In source code the words I want to complete are not always seperated by whitespace.
For example in:
ba
is not completeable tobar
becausebar
is not surrounded by whitespace.Furthermore, the completion now suggests useless artifacts, consider:
Now the completions suggests
foo(int
because it is surrounded by whitespace.