Closed PootieT closed 1 year ago
One hacky approach might be to check if right
(which is a string at this point) contains c(
or list(
.
return " if(!identical({}, {}))".format(left, right) + "{quit('no', 1)}"
https://github.com/nuprl/MultiPL-E/blob/main/dataset_builder/humaneval_to_r.py#L108
A completely different approach is to translate literals differently: in Python, numeric literals default to integers (so 1
is an integer and 1.0
is a float), but in R, numeric literals default to floats (so 1
is a double and 1L
is an integer). But this would change all the tests completely.
Thanks @mhyee ! I will do the hacky approach for now on my local branch!
FYI, for the hacky check condition, one would need to check
if not any([right.startswith(w) for w in ["c(", "list(", "NULL"]]):
since NULL
comparison requires identical()
.
@mhyee should we change all tests completely? It's not a big deal to do so.
@PootieT any reason not to use ==
everywhere like you suggest? I guess its more lax about types. But, in these benchmarks we know that we shouldn't be doing heterogeneous comparisons.
I'm actually not sure we should change the tests to use integers. I think most people just use doubles by default, e.g. nobody writes c(1L, 2L, 3L)
. However, the :
operator produces integer vectors.
We can't use ==
for everything because if (x == y)
is an error if x
or y
have more than one element.
But I'm looking at the docs, and we could use some combination of ==
, all
, and isTRUE
. There's also all.equal
, which does "near equality" (useful for doubles). Let me think about it a bit more.
Yeah, so far, locally the hacky solution works for me and I don't see any additional weird cases where the program looks correct but fails unit test. BUT, one case it can fail is when we have two lists with integers and doubles, then identical
would fail c(1L,2)
and c(1,2)
for instance.
One additional reason we can't use ==
for all is NULL == NULL
would yield NA
I think, instead of a boolean (and ==
doesn't work between lists).
I think we can probably do:
all.equal
==
since all.equal seems to disregard types of each element when comparing numerical values and also handles NULL
type
all.equal(list(NULL,1), list(NULL,1)) # returns TRUE
all.equal(list(1L,2), list(1,2)) # returns TRUE
all.equal(c(1L,2), c(1,2)) # returns TRUE
I think we can use isTRUE(all.equal(x, y))
for all comparisons. That keeps things simpler, instead of trying to detect what the values are and then changing the comparison function.
> isTRUE(all.equal("abc", "def"))
[1] FALSE
> isTRUE(all.equal("abc", "abc"))
[1] TRUE
> isTRUE(all.equal(1:2, c(1,2)))
[1] TRUE
> isTRUE(all.equal(1L, 1.0))
[1] TRUE
I'll push a fix to the PR.
looks good!
This is a very nuanced difference but is actually the cause of at least 10% of the issues in R unit tests (in HumanEval I have seen):
Example: HumanEval_60_sum_to_n:
this is not working because
sum
returns an integer, and we are comparing it with float in the output. Theidentical
comparator requires the types of variables to be the same. In these cases, the type should not matter.My suggestion might be to change
identical
to==
comparator, but only for these single numeric value comparisons. so in this case:@mhyee Maybe you have an idea how to fix this in the transpiler code? CC @arjunguha