red / REP

Red Enhancement Process
BSD 3-Clause "New" or "Revised" License
11 stars 4 forks source link

WISH: CMP_SORT exposed to Red level code #147

Open hiiamboris opened 1 year ago

hiiamboris commented 1 year ago

Currently it's possible to use it but requires a veeery ugly hack:

    ordered?: function [
        "Check if value1 comes before value2 in sort order"
        value1 [any-type!] value2 [any-type!]
    ][
        :value1 =? first sort reduce/into [:value1 :value2] clear []
    ]

I could also imagine operators A before? B and A after? B, but probably not enough justification for them.

I know of two use cases:

  1. Table sorting:
    • you make an array of indices 1 to N, then sort it as a proxy column: sort/compare indices func [i1 i2] [ordered? column/:i1 column/:i2
    • then you can reorder every column with your indices array
  2. minimum-of and maximum-of functions that find extrema of data in a block. These two should be based on more general accumulate design that should accept a binary comparator function/native. Unlike first sort copy block solution, this is O(n), not O(nlogn). Since the block can contain anything at all, it's valuable to:
    • avoid error on items incomparable directly with < and <=
    • be able to return a value that actually exists in the block (unlike min/max that can produce new values: 0.0.0 = min 0 1.2.3)

Q: Are there more use cases? Q: What would be an ideal interface, considering possible /reverse sorting need?

greggirwin commented 1 year ago

Part of related chat was whether min/max should return new, mixed values, e.g.

>> min 5x10 11x6
== 5x6

which happens for pairs and tuples, or if they should work like other compound values, like blocks.

>> min [5 10] [11 6]
== [5 10]

Today you have to use the sort trick to get around that for pairs and tuples.

hiiamboris commented 1 year ago

I actually mentioned that ;)

greggirwin commented 1 year ago

I can think of a couple related areas. 1) always-sorted lists 2) aggregates.

1) Relates to @hiiamboris' note on table sorting and DB indexes. Block based (single, key-val, multi), with the difference being that you always use a wrapper doing binary search to find insert point (append + sort for bulk updates). Amortizes insertion cost. The downside to this, with no linked list structure in Red, is that inserts are brutally slow on large blocks toward the head Scaling depends on usage patterns. Have to think about how reverse should work.

2) These are useful to have, but also something you need to plan for, rather than using min/max/sum/avg/... ad hoc, and update values in an aggregator rather than plain values. The benefit being efficiency both on collection and query. Still suffers errors on incompatible type comparison, which is a design call as errors may be useful there, or a refinement might work. Old experiments in my HOF repo. Some form of aggregator is critical for instrumentation.

I actually mentioned that ;)

I was just making it more explicit. :^)