red / REP

Red Enhancement Process
BSD 3-Clause "New" or "Revised" License
10 stars 4 forks source link

On edge cases of `find` #145

Open hiiamboris opened 1 year ago

hiiamboris commented 1 year ago

find is very rich on refinements, but have their correlations ever been considered?

USAGE:
     FIND series value

DESCRIPTION:
     Returns the series where a value is found, or NONE.
     FIND is an action! value.

ARGUMENTS:
     series       [series! bitset! typeset! port! map! none!]
     value        [any-type!] {Typesets and datatypes can be used to search by datatype.}

REFINEMENTS:
     /part        => Limit the length of the search.
        length       [number! series!]
     /only        => Treat series and typeset value arguments as single values.
     /case        => Perform a case-sensitive search.
     /same        => Use "same?" as comparator.
     /skip        => Treat the series as fixed size records.
        size         [integer!]
     /last        => Find the last occurrence of value, from the tail.
     /reverse     => Find the last occurrence of value, from the current index.
     /tail        => Return the tail of the match found, rather than the head.
     /match       => Match at current index only.

1. find/last series looks like an equivalent of find/reverse tail series: is it?

1.1. I suppose /reverse should be just ignored in presence of /last? Or raise an error?

I'm guilty of writing /last/reverse sometimes by mistake, and it's unclear from the spec if it works the same as /last or the same as /reverse.

1.2. What if series is not at head. Should /last stop at this index then? Turns out it does:

>> find/last "abcde" "a"
== "abcde"
>> find/last next "abcde" "a"
== none

So find/last series should be seen as find/reverse/part tail series series. Docstring is quite vague on the exact behavior, so this should probably go into specs.

1.3. But then what if /part is also specified? Where is it counted from, and does it override the implied /part from /last?

>> find/last/part "abcde" "a" 1
== "abcde"
>> find/last/part "abcde" "b" 1
== none

So "from the tail" in /last docstring is tricky: it moves not to the end of the series, but to the end of the /part. Then sets original index as the new /part:

>> find/last/part next "abcde" "b" 3
== "bcde"
>> find/last/part next "abcde" "d" 3
== "de"
>> find/last/part next "abcde" "e" 3
== none
>> find/last/part next "abcde" "a" 3
== none

Cool! But what if part is negative? Whatever it is doing, doesn't look like anything:

>> find/last/part skip "abcde" 4 "a" -3
== none
>> find/last/part skip "abcde" 4 "b" -3
== none
>> find/last/part skip "abcde" 4 "d" -3
== none
>> find/last/part skip "abcde" 4 "e" -3
== none

1.4. So what meaning should negative /part have in presence of /last?

What seems to make sense is to treat the /part symmetrically and result in:

>> find/last/part skip "abcde" 4 "b" -3
== "bcde"
>> find/last/part skip "abcde" 4 "d" -3
== "de"
>> find/last/part skip "abcde" 4 "e" -3
== none
>> find/last/part skip "abcde" 4 "a" -3
== none

Part still delineates "bcd" region, and its sign doesn't affect anything. That's how works in R2 btw.


2. Negative part sort of implies backward direction: its index is at tail already. But /reverse and /last also imply that. How do they interplay?

2.1. Since find is not like copy, but is directional, this could also work as above without /last:

>> find/part skip "abcbe" 4 "b" -3
== "be"

Right now in Red result is none, but R2 iterates forward:

>> find/part skip "abcbe" 4 "b" -3
== "bcbe"

So we can say that in R2 /part is just a slice, unordered, so carries no direction. But we could add direction to it. E.g. find/part tail series x series could work as find/last series x, and find/part series x head series - as find/reverse series x.

2.2. That is /part (with implied current index) could encode both search region and search direction.

2.3. What does it mean to find/reverse/part when part is positive or negative?

If reverse goes from the "current index" to the left, and part is positive so extends to the right, it can never find anything? If part is negative, only then find/reverse/part makes sense? But then find/part doesn't?

Or if part is symmetric as in R2, and negative, then /reverse and /tail should mean the same thing? In R2 /reverse is useless with negative /part:

>> find/part/reverse tail "abcbe" "b" -4
== none
>> find/part/last tail "abcbe" "b" -4
== "be"

But with positive /part it seems to ignore the /part value (how does it find 'a'?):

>> find/part/reverse tail "abcbe" "b" 4
== "be"
>> find/part/reverse tail "abcbe" "a" 4
== "abcbe"

In Red it's all useless:

>> find/part/reverse tail "abcbe" "e" -4
== none
>> find/part/reverse tail "abcbe" "b" -4
== none
>> find/part/reverse tail "abcbe" "b" 4
== none
>> find/part/reverse "abcbe" "b" 4
== none
>> find/part/last "abcbe" "b" -4
== none
>> find/part/last tail "abcbe" "b" -4
== none
>> find/part/last tail "abcbe" "b" 4
== none

Or not all??

>> find/part/reverse tail make hash! [a b c b e] 'b 4
== make hash! [b e]

...


3. /match as I understand it was a forward-only idea in the first place. How should it work with /reverse, /last, /skip, negative /part?

/match/reverse was dismissed twice: https://github.com/red/red/issues/3339 and https://github.com/red/red/issues/3609

According to the comments, only /part makes sense for /match, but what if it's negative? R2 just moves to the left side of /part and matches:

>> find/match/part tail "abcde" "de" -4
== none
>> find/match/part tail "abcde" "bc" -4
== "de"

But it doesn't seem useful.


4. /skip meets /reverse and /last

Is better covered here, main argument being able to search backwards in any column: https://github.com/red/red/issues/5119

Adding /part to the mix in R2 (given its symmetry) was supposed to both move the starting location (not necessarily right before the end, see the issue 5119), and skip a number of items between matches, but:

>> find/skip/part/reverse tail "abcde" "e" 2 -5
== none
>> find/skip/part/reverse tail "abcde" "d" 2 -5
== none
>> find/skip/part/reverse tail "abcde" "c" 2 -5
== none
>> find/skip/part/reverse tail "abcde" "b" 2 -5
== none
>> find/skip/part/reverse tail "abcde" "a" 2 -5
(R2 hangs!)

Given the outlined array of issues, what can be done:

I. Remove /match, move it into /match native which may have other uses eventually (e.g. LCS). II. /part  A. Replace /part with https://github.com/red/REP/issues/97 (whatever semantic is chosen there for negative part). The most scalable solution, but needs more design thought.  B. Treat /part symmetrically as in R2 (legacy compatibility). find/reverse/part .. n should then always return none for nonnegative n, and only work for negative n. To search backwards independent of /part sign one will have to use /last, rather than /reverse.  C. Infer direction from /part sign, making both /reverse and /last possible to express with just /part. More natural in that search always starts at current index with any /part sign. Adding /last to /part though still will start from another location. /reverse should then align with /part sign, else return none. III. /reverse  A. Remove it and use a combination of /part and /last instead. In my experience find/reverse is quite rare.  B. Express it in terms of /part and /last internally, forbid /last/reverse combo. IV. Rewrite /last docstring to indicate that it starts at the right side of /part, and is incompatible with /reverse. V. Generalize /skip using https://github.com/red/REP/issues/94

It's possible to cut find down to [/last /tail /only /case /same] set of refinements.