Open orefalo opened 1 year ago
Was there any specific ones you've missed? I ask because some of these exist already:
string collect [-a | --allow-empty] [-N | --no-trim-newlines] [STRING ...]
That's Murex's default behaviour
string escape [-n | --no-quoted] [--style=] [STRING ...] string unescape [--style=] [STRING ...]
» builtins -> regexp m/^!?esc/
[
"!escape",
"!eschtml",
"!escurl",
"escape",
"esccli",
"eschtml",
"escurl"
]
Escaping are the ones without the bang prefix, whereas unescaping are the ones with the bang.
string join [-q | --quiet] [-n | --no-empty] SEP [STRING ...] string join0 [-q | --quiet] [STRING ...]
This doesn't currently exist but it's a good suggestion
string length [-q | --quiet] [STRING ...]
» %[one two three]
[
"one",
"two",
"three"
]
» %[one two three] -> count
3
(There's also an alias len
, for backwards compatibility with much older versions of Murex)
string lower [-q | --quiet] [STRING ...] string upper [-q | --quiet] [STRING ...]
These don't currently exist but it's a good suggestion.
string match [-a | --all] [-e | --entire] [-i | --ignore-case] [-g | --groups-only] [-r | --regex] [-n | --index] [-q | --quiet] [-v | --invert] PATTERN [STRING ...]
match
: https://murex.rocks/docs/commands/match.htmlregexp
: https://murex.rocks/docs/commands/regexp.html (with the m
or f
flag)string pad [-r | --right] [(-c | --char) CHAR] [(-w | --width) INTEGER] [STRING ...] string repeat [(-n | --count) COUNT] [(-m | --max) MAX] [-N | --no-newline] [-q | --quiet] [STRING ...]
These doesn't currently exist but they're a good suggestion too
string replace [-a | --all] [-f | --filter] [-i | --ignore-case] [-r | --regex] [-q | --quiet] PATTERN REPLACE [STRING ...]
regexp
: https://murex.rocks/docs/commands/regexp.html (with the s
flag)string shorten [(-c | --char) CHARS] [(-m | --max) INTEGER] [-N | --no-newline] [-l | --left] [-q | --quiet] [STRING ...]
right
kind of does this. It doesn't add the ellipsis nor check for wide characters though. Maybe there is a case for flag that would check character width instead of number of characters?
string split [(-f | --fields) FIELDS] [(-m | --max) MAX] [-n | --no-empty] [-q | --quiet] [-r | --right] SEP [STRING ...] string split0 [(-f | --fields) FIELDS] [(-m | --max) MAX] [-n | --no-empty] [-q | --quiet] [-r | --right] [STRING ...]
jsplit
does this. It's not as feature rich as this but it supports regexp. Murex's type system also negates some of the need for manual splitting. Also regexp
with the f
flag could work here too.
string sub [(-s | --start) START] [(-e | --end) END] [(-l | --length) LENGTH] [-q | --quiet] [STRING ...]
left
and right
are supposed to solve this. However if you want something midway through a string then you have to pipe one into the other...which is a tad verbose :(
I had thought about creating another data-type called bytes
which would basically be a byte array. That way you could index and range over the bytes with []
. But it wasn't something I certain of implementing because it might lead some people to think it was a performant way of handling strings (like in C-like languages) whereas it could actually be a lot slower due to the way how Murex generally expects higher level abstracts. It's something to consider still though.
string trim [-l | --left] [-r | --right] [(-c | --chars) CHARS] [-q | --quiet] [STRING ...]
This doesn't exist verbatim but regexp 's/^\s+/'
and regexp 's/\s+$/'
would work in the meantime. I had given some thought in the past about when to trim things and when not to so it's a little weird I never thought to add this myself.
I also need to give some thought about how to make these builtins better discoverable and thus also how newer builtins should be named. At present there are dozens of commands in the root namespace and it's not obvious what is available (unlike Fish that has a string
builtin with a lot of functionality grouped inside it)
ah thank you. must have missed it.
Reopening this issue - I am interested in match
or regexps
$ vm_stat
Mach Virtual Memory Statistics: (page size of 16384 bytes)
Pages free: 110207.
Pages active: 856642.
Pages inactive: 817294.
Pages speculative: 39658.
Pages throttled: 0.
Pages wired down: 201646.
Pages purgeable: 38864.
"Translation faults": 584509074.
Pages copy-on-write: 17797467.
Pages zero filled: 356362396.
Pages reactivated: 3156317.
Pages purged: 1813704.
File-backed pages: 574258.
Anonymous pages: 1139336.
Pages stored in compressor: 165434.
Pages occupied by compressor: 16784.
Decompressions: 697114.
Compressions: 13578033.
Pageins: 16590981.
Pageouts: 63961.
Swapins: 304621.
Swapouts: 5507225.
$ vm_stat | grep -o -E '[0-9]+'
16384
113692
854433
815431
39476
0
202317
42705
584370609
17793166
356274061
3156084
1813659
574031
1135309
165436
16784
697112
13578033
16590820
63961
304621
5507225
Now, I am trying the same as the above with the regexps built-in
murex-utils » vm_stat | regexp 'm/[0-9]+/'
Mach Virtual Memory Statistics: (page size of 16384 bytes)
Pages free: 108686.
Pages active: 858720.
Pages inactive: 817011.
Pages speculative: 40710.
Pages throttled: 0.
Pages wired down: 200165.
Pages purgeable: 37305.
"Translation faults": 584860711.
Pages copy-on-write: 17815427.
Pages zero filled: 356571019.
Pages reactivated: 3156647.
Pages purged: 1814793.
File-backed pages: 575200.
Anonymous pages: 1141241.
Pages stored in compressor: 165404.
Pages occupied by compressor: 16788.
Decompressions: 697144.
Compressions: 13578033.
Pageins: 16591801.
Pageouts: 63961.
Swapins: 304629.
Swapouts: 5507225.
murex-utils » vm_stat | regexp 'f/[0-9]+/'
murex-utils » vm_stat | regexp 's/[0-9]+/'
Mach Virtual Memory Statistics: (page size of bytes)
Pages free: .
Pages active: .
Pages inactive: .
Pages speculative: .
Pages throttled: .
Pages wired down: .
Pages purgeable: .
"Translation faults": .
Pages copy-on-write: .
Pages zero filled: .
Pages reactivated: .
Pages purged: .
File-backed pages: .
Anonymous pages: .
Pages stored in compressor: .
Pages occupied by compressor: .
Decompressions: .
Compressions: .
Pageins: .
Pageouts: .
Swapins: .
Swapouts: .
murex-utils »
``
What am I doing wrong? Why would the `m` return the full line and not just the matches?
m//
returns lines that match
f//
returns found strings.
you might need to wrap your regex in parentheses for f//
to work (I can’t recall if I solved that requirement or not).
so for your case, you would need f rather than m (f is like grep -o)
$ vm_stat | regexp 'f/([0-9])+/'
4
7
0
9
1
0
4
2
0
8
5
8
7
5
5
2
2
3
3
8
1
2
5
I tried pretty much all options - will look in the code a little later
goti it!
vm_stat -> regexp 'f/([0-9]+)/'
16384
105049
862046
817713
43910
0
196640
33152
594547466
18024595
364657929
3163876
1835712
567255
1156414
164454
16844
698081
13578033
16611886
63961
304736
5507225
re-opening this because some of these suggestions do deserve proper consideration for inclusion into Murex
Describe the problem: Murex needs built-in string manipulation functions.
Why? Because no Unix system is the same, and having built-ins is not only fast and convenient but it also ensures compatibility.
Documentation: For reference, here are the sting primitives from
fish
https://fishshell.com/docs/current/cmds/string.html