Closed pveber closed 3 years ago
What about R version 4.0.3, the current one? Anyway, I think Virgile can reasonably take care about this, right?
The change appears with version 4.0.0 of R: see https://developer.r-project.org/Blog/public/2020/02/16/stringsasfactors/
For the morse
side, it seems to don't have any problem. The data we store have replicate even as numeric
> data("cadmium1") ; class(cadmium1$replicate)
[1] "numeric"
> data("cadmium2") ; class(cadmium2$replicate)
[1] "numeric"
> data("chlordan") ; class(chlordan$replicate)
[1] "numeric"
> data("copper") ; class(copper$replicate)
[1] "numeric"
> data("dichromate") ; class(dichromate$replicate)
[1] "numeric"
> data("zinc") ; class(zinc$replicate)
[1] "numeric"
> data("propiconazole") ; class(propiconazole$replicate)
[1] "factor"
> data("propiconazole_pulse_exposure") ; class(propiconazole_pulse_exposure$replicate)
[1] "factor"
And, it's the same after survData
. Since we have to use survData
, maybe we can ensure that all survData
object have replicate of class character
?
Now:
> class(survData(cadmium1)$replicate)
[1] "numeric"
> class(survData(propiconazole)$replicate)
[1] "factor"
As of now I only got failure in mosaic unit tests
@pveber Tu pourrais me mettre le lien des tests unitaires que tu fais sur mosaic ?
@virgile-baudrot Thanks for having a look at it!
Here's a simple example that shows the problem:
> data(cadmium1)
> cadmium1$replicate <- as.character(cadmium1$replicate)
> class(cadmium1$replicate)
[1] "character"
> class(survData(cadmium1)$replicate)
[1] "character"
In my original post, I mentionned that the same issue can be obtained when reading the data with read.table
, which I do in my unit test.
ok. But what should be the class of "replicate"? Because, any class should work since it's used to isolate/filter a group
Ok. I think I get your point. Since replicate
is only a label, the type can be numeric
or character
... even logical
if two class but it's weird! And we say integer
or factor
.
R is crazy 😄 :
> a = c("1", "2", 3)
> typeof(a)
[1] "character"
> b = c(1, 2, 3)
> typeof(b)
[1] "double"
> a == b
[1] TRUE TRUE TRUE
> class(a) == class(b)
[1] FALSE
> class(a[1]) == class(b[1])
[1] FALSE
> class(c("a", "b"))
[1] "character"
> typeof(c("a", "b"))
[1] "character"
> class(as.factor(c("a", "b")))
[1] "factor"
> typeof(as.factor(c("a", "b")))
[1] "integer"
> typeof(as.factor(c("a", "b")))
[1] "integer"
> as.factor(c("a", "b")) + 1
[1] NA NA
Warning message:
In Ops.factor(as.factor(c("a", "b")), 1) : ‘+’ not meaningful for factors
> c("a", "b") + 1
Error in c("a", "b") + 1 :
argument non numérique pour un opérateur binaire
I would say that column replicate should be filled in by any type of input left to the choice the user himself. That is letters, numbers, of combination of both...
And I thought I had seen it all :)
I think first and foremost, whatever the function does, the doc should describe it correctly. Now should the doc be changed, or the behavior of the function? My point is that since the doc says replicate is an int or a factor, that fact might be used in other places of the code in the library. Making sure that allowing character
values won't hurt somewhere else (including in mosaic) is a lot of (tedious) work. It'd be simpler I think to allow character
values for replicate
in the argument of survData
but to convert them to factors in the value it produces. So that we have freedom for users and reliability for downstream code.
OK. I'm doing that!
should solve the issue
@pveber I let you close the issue if it's ok for you
I confirm the tests are OK now for Mosaic (I'm the one who reported the issue to @pveber).
EDIT: do you plan to make a new release on CRAN ?
EDIT: do you plan to make a new release on CRAN ?
Yes, I do
Very good, thank you all for taking care of this!
@virgile-baudrot @sandrinecharles
I love R, I just do. Compare:
and:
This change revealed a bug in
survData
. While the documentation says:it outputs a replicate column of class
character
when the corresponding column in the input dataframe is of this class. A conversion should be performed here, because I'm not sure how much the rest of the code relies on this field to beinteger
(that's a situation where static typing would have helped!). As of now I only got failure in mosaic unit tests but this is likely to affect the whole application.The bug came out because the
replicate
column is often encoded with string values, andread.table
used to encode those as a factor by default. This is not the case anymore, and this make the type check (and conversion) even more important. Correcting this and making the corresponding release would help a lot with mosaic. The needed change is I think very trivial but since I've not been handling the releases for a long time, I think it's better to let you handle it. Would it be ok?