metaeducation / rebol-issues

6 stars 1 forks source link

UNION looks broken with /skip and type mixing #1963

Open rebolbot opened 11 years ago

rebolbot commented 11 years ago

Submitted by: Sunanda

UNION/skip doesn't perform the union if there is more than one type in the collection of values in the first positions of the fixed records; it just returns a copy of the first argument. If all of the first-position values are the same datatype it does the same thing as the other series-of-fixed-records functions that do comparisons.

See #428 for the discussion of what the proper comparison behavior of these kinds of functions should be. However, in this case we have an additional bug beyond that discussion, where it isn't even doing a union at all in the mixed-type comparison case.

>> union [x x 1 2 1 3] [1 4]
== [x 1 2 3 4]    ; as expected
>> union [1 2 1 3] [1 4]
== [1 2 3 4]      ; as expected
>> union/skip [x x 1 2 1 3] [1 4] 2 
== [x x 1 2 1 3]  ; should be [x x 1 2 1 3 1 4], union not performed
>> union/skip [x x 1 2 1 2] [1 4] 2 
== [x x 1 2 1 2]  ; should be [x x 1 2 1 4], just a copy of the first arg
>> union/skip [1 2 1 3] [2 4] 2
== [1 2 2 4]      ; should be [1 2 1 3 2 4], union performed but bad R2-compatible behavior (see #428)

CC - Data [ Version: alpha 112 Type: Bug Platform: Windows Category: Native Reproduce: Always Fixed-in:none ]

rebolbot commented 11 years ago

Submitted by: Ladislav

Frankly, I would expect to obtain

[x x 1 2 1 3 1 4]
as the result.

That is because for me [1 2] [1 3] and [1 4] are distinct.

rebolbot commented 11 years ago

Submitted by: Sunanda

They would be distinct without /skip.

/skip in all the set operations (DIFFERENCE, INTERSECT, UNION etc) has a specific meaning, somewhat like MAP.

The first value is effectively treated as the key, and the subsequent ones are ignored for the purposes of treating records as distinct.

So the line below results in an empty block in R2 -- the two input blocks are not different within REBOL's operating definitions.

DIFFERENCE/SKIP [1 x x 2 y y] [1 a a 2 b b] 3

Maybe that needs to change with R3 -- but it would be a big change for the operational model

(R3 gets this one wrong too .... Probably the same underlying bug)
rebolbot commented 11 years ago

Submitted by: BrianH

I agree with Ladislav. See #428 for a suggestion of a /compare option (like SORT/compare) to make it only pay attention to certain record positions rather than the whole record.

It's worth noting that SORT/compare might also not work correctly in R3, and SORT itself definitely doesn't (#1152 among others). So, it's probably a better idea to discuss what we should be doing rather than trying to be compatible with SORT in R3 or R2.

rebolbot commented 11 years ago

Submitted by: abolka

I would rather expect [x x 1 2 1 3 1 4] as well, but then UNIQUE/skip also implements the "key" behaviour Sunanda describes. I see both behaviours as useful.

For reference, here's the (relevant) R3 functions coming with a /skip refinement:

- difference
- exclude
- find
- intersect
- maximum-of
- minimum-of
- move
- select
- sort
- union
- unique

It's probably worth looking over them for consistency in /SKIP usage.

The /COMPARE suggestion from #428 seems like a good idea to let us have both behaviours.

rebolbot commented 11 years ago

Submitted by: Sunanda

Although there may be a philosophical / design issue to discuss here, please note that R3 follows the R2 model in most cases, eg:

    difference/skip [ 1 2] [1 3] 2
    == []

The specific bug seems to emerge when the blocks contain unset words:

    difference/skip [x x 1 2] [1 3] 2
    == [x x]

That bug needs fixing even if the R2 reference implementation is later changed in some way.

rebolbot commented 11 years ago

Submitted by: BrianH

Good catch on the unset words thing, that is definitely something that can be fixed quickly.

Keep in mind, Sunanda, that the policy of R3 is to follow the R2 behavior only when there isn't significantly better behavior to follow. Compatibility for compatibility's sake is covered by #666. Our goal is to be better, but significantly better - we don't want to be different for the sake of being different either (see #667).

In the case of the of the /skip option for functions that do series comparison, I can assure you that if R3 follows R2's behavior it is because we haven't had this discussion yet, because we haven't had this discussion yet. So let's discuss it now. If R2's behavior or some variant of it wins (on its own merits) then we'll make sure it works, and that we don't get any more #1152 style errors.

Let's have this ticket be about the unset words issue, and move the general /skip comparison behavior policy discussion to #428.

rebolbot commented 11 years ago

Submitted by: Sunanda

Earlier issue reporting similar thing:

  http://curecode.org/rebol3/ticket.rsp?id=726
rebolbot commented 11 years ago

Submitted by: abolka

"Let's have this ticket be about the unset words issue"

If this ticket ought to be about unset words alone, please someone with the power to do so: adapt/rephrase it.

rebolbot commented 11 years ago

Submitted by: BrianH

Done. And further testing revealed that it was mixed-type comparisons, not unset words.