r-devel / r-dev-day

Repo to organize tasks for R Dev Days
7 stars 2 forks source link

Revise, document and test patch to `xyTable()` for better handling of `NAs` #36

Closed hturner closed 2 months ago

hturner commented 3 months ago

In Bug 18654 I proposed a patch to xyTable() for better handling of NAs. The task here is to respond to the review by @pmur002, revising the patch to help get it ready for committing to R.

The steps to take are

So this will give an introduction to patching code and documentation and writing regression tests for R, without needing to get deep into the code (it only touches R code). Good for a new contributor, perhaps with some R package development experience.

I am happy to work with someone on this, or for others to take over from here.

zkamvar commented 3 months ago

I would be happy to support anyone working on this. I used it as practice on the train and I think I’ve got the hang of it!

I’m also the current maintainer of adegenet, so I’m fairly confident in my ability to test it.

zkamvar commented 3 months ago

Hi @njtierney, i am right now checking all the packages and am going to upload the patch.

zkamvar commented 3 months ago

All the checks pass and I have submitted the patch:

Index: src/library/grDevices/R/calc.R
===================================================================
--- src/library/grDevices/R/calc.R  (revision 87022)
+++ src/library/grDevices/R/calc.R  (working copy)
@@ -131,10 +131,12 @@
        orderxy <- order(x, y)
        x <- x[orderxy]
        y <- y[orderxy]
-       first <- c(TRUE, (x[-1L] != x[-n]) | (y[-1L] != y[-n]))
+       first <- which(c(TRUE,
+                        (x[-1L] != x[-n]) | xor(is.na(x[-1L]), is.na(x[-n])) |
+                        (y[-1L] != y[-n]) | xor(is.na(y[-1L]), is.na(y[-n]))))
        x <- x[first]
        y <- y[first]
-       diff(c((1L:n)[first], n + 1L))
+       diff(c(first, n + 1L))
    }
    else integer()

Index: src/library/grDevices/man/xyTable.Rd
===================================================================
--- src/library/grDevices/man/xyTable.Rd    (revision 87022)
+++ src/library/grDevices/man/xyTable.Rd    (working copy)
@@ -30,6 +30,10 @@
   \item{number}{multiplicities (positive integers); i.e.,
     \code{number[i]} is the multiplicity of \code{(x[i], y[i])}.}
 }
+\note{
+  Missing values in the coordinates are counted towards the multiplicities. The
+  sum ofthe multiplicities will equal the number of coordinates.
+}
 \seealso{\code{\link{sunflowerplot}} which typically uses
   \code{xyTable()}; \code{\link{signif}}.
 }
@@ -36,6 +40,12 @@
 \examples{
 xyTable(iris[, 3:4], digits = 6)

+## If missing coordinates exist, they are also counted
+iris2 <- iris[1:10, 3:4]
+iris2[4, 2] <- NA
+iris2[c(3, 5), ] <- NA
+xyTable(iris2)
+
 ## Discretized uncorrelated Gaussian:
 \dontshow{set.seed(1)}
 xy <- data.frame(x = round(sort(stats::rnorm(100))), y = stats::rnorm(100))
Index: src/library/grDevices/tests/xyTable.R
===================================================================
--- src/library/grDevices/tests/xyTable.R   (nonexistent)
+++ src/library/grDevices/tests/xyTable.R   (working copy)
@@ -0,0 +1,33 @@
+## [Bug 18654] xyTable fails when both x and y are NA (2024-01-16)
+##             https://bugs.r-project.org/show_bug.cgi?id=18654
+## Attachment 3292 https://bugs.r-project.org/attachment.cgi?id=3292
+## Scenarios authored by Heather Turner in comments #1 and #5
+
+## Case 2: one variable has NA - works fine
+## (first combination from Case 1 now has NA)
+iris2 <- iris[1:10, 3:4]
+iris2[3, 1] <- NA
+xyTable(iris2)
+
+## Case 3: both x and y are NA for one case - no good
+## (`number` should be the same as for Case 2)
+iris3 <- iris[1:10, 3:4]
+iris3[3, ] <- NA
+xyTable(iris3)
+
+
+## Case 4: both x and y are NA for >1 case - no good
+## (records with both NA are not aggregated)
+iris4 <- iris[1:10, 3:4]
+iris4[c(3, 5), ] <- NA
+xyTable(iris4)
+
+## Case 5: NA in y when x is duplicated
+iris5 <- iris[1:10, 3:4]
+iris5[4, 2] <- NA
+xyTable(iris5)
+
+## Case 6: NA in y when x is duplicated
+iris6 <- iris[1:10, 3:4]
+iris6[] <- NA
+xyTable(iris6)
njtierney commented 2 months ago

This is very cool to see how all this works!

pmur002 commented 2 months ago

Thanks for the work so far. I have added a small additional task to the bugzilla page to hopefully finish this off.

zkamvar commented 2 months ago

Sorry about the delay on that. I just submitted the patch.

pmur002 commented 2 months ago

Thanks. Committed now (r87144)