mixOmicsTeam / mixOmics

Development repository for the Bioconductor package 'mixOmics '
http://mixomics.org/
153 stars 51 forks source link

Feature request: add default values for non-compulsory arguments (e.g., keepX, keepY) #29

Closed GegznaV closed 5 years ago

GegznaV commented 5 years ago

In sparse methods, e.g., sPLD-DA (splsda()), could default values be set for arguments, that are not compulsory? In particular, could it be: keepX = NULL, keepY = NULL? It would be easier to figure out, which arguments are really the most important and it would prevent from syntax checking and linting warning like this one:

image

Possible solution. As I understand, to enable this functionality, it would be sufficient to change some lines in internal function Check.entry.pls(). E.g., instead of:

    if (missing(keepX)) {

these lines could be used:

    if (missing(keepX) || is.null(keepX)) {

It's crucial to use || here and not | as the later will result in error, if keepX is missing. The same idea could be applied for keepY.

Is there any reason why this functionality could not be implemented in mixOmics?

aljabadi commented 5 years ago

Hi @GegznaV, I tried your example with all mixOmics versions (Bioconductor, GitHub master, and GitHub devel) and did not get any error. Can you please provide a full reproducible code, which also installs your version of mixOmics?

GegznaV commented 5 years ago

I use remotes::install_github("mixOmicsTeam/mixOmics", upgrade = TRUE) to install mixOmics. I do not develop any version of this package, but I'm planning to use it within mlr framework.

Did you need this information?

aljabadi commented 5 years ago

That's interesting. Definitely keep us updated on how you go and we're happy to help if you need support.

I re-installed mixOmcis using remotes::install_github("mixOmicsTeam/mixOmics", upgrade = TRUE) and I could successfully ran the splsda example below:

suppressMessages(remotes::install_github("mixOmicsTeam/mixOmics", upgrade = TRUE))
library(mixOmics)
#> Loading required package: MASS
#> Loading required package: lattice
#> Loading required package: ggplot2
#> 
#> Loaded mixOmics 6.8.1
#> 
#> Thank you for using mixOmics! Learn how to apply our methods with our tutorials on www.mixOmics.org, vignette and bookdown on  https://github.com/mixOmicsTeam/mixOmics
#> Questions: email us at mixomics[at]math.univ-toulouse.fr  
#> Bugs, Issues? https://github.com/mixOmicsTeam/mixOmics/issues
#> Cite us:  citation('mixOmics')

data(breast.tumors)
X <- breast.tumors$gene.exp
# Y will be transformed as a factor in the function,
# but we set it as a factor to set up the colors.
Y <- as.factor(breast.tumors$sample$treatment)

res <- splsda(X, Y)
res
#> 
#> Call:
#>  splsda(X = X, Y = Y) 
#> 
#>  sPLS-DA (regression mode) with 2 sPLS-DA components. 
#>  You entered data X of dimensions: 47 1000 
#>  You entered data Y with 2 classes. 
#> 
#>  Selection of [1000] [1000] variables on each of the sPLS-DA components on the X data set. 
#>  No Y variables can be selected. 
#> 
#>  Main numerical outputs: 
#>  -------------------- 
#>  loading vectors: see object$loadings 
#>  variates: see object$variates 
#>  variable names: see object$names 
#> 
#>  Functions to visualise samples: 
#>  -------------------- 
#>  plotIndiv, plotArrow, cim 
#> 
#>  Functions to visualise variables: 
#>  -------------------- 
#>  plotVar, plotLoadings, network, cim 
#> 
#>  Other functions: 
#>  -------------------- 
#>  selectVar, tune, perf, auc

Please run the example above and see if you continue to have problems (which I highly doubt).

GegznaV commented 5 years ago

Do we understand each other correctly? I did not report any errors of the package, just asked for improved default values for keepX and keepY.

Usually, it's a good practice to give some default values for arguments, that can be computed internally, as is the case for keepX and keepY (from the documentation of splsda()): image

If a user does not provide values of e.g. keepX for e.g. splsda(), the function does not break. But due to this, some R code checking tools show warnings such as this one in RStudio: image

To reproduce the warning message in RStudio: 1) Install RStudio (I use version 1.2); 2) Set up RStudio to check R function arguments. a) Go to "Tools" → "Global options" image

b) Enable code checking tools as shown below:

image 3) Create a new R Script document (e.g., press Ctrl + Shift + N on Windows). 4) Load mixOmics: library(mixOmics) 5) Write code in the script:

data(breast.tumors)
X <- breast.tumors$gene.exp
Y <- as.factor(breast.tumors$sample$treatment)
res <- splsda(X, Y)

5) Save the document. 6) After saving, yellow exclamation mark should appear at the left side of line numbers: image 7) Hover over the exclamation mark with a mouse and the message will apear: image

The message could be prevented if keepX had a default value. It also would be easier from users, who want to create packages based om mixOmics functions, if arguments like keepX had default values.


When I read your message, I thought that you implemented the proposed changes and said that they cause no errors in any of the mixOmics branches on Github.

My questions were:

1) Why aren't any default values, such as keepX = NULL, used in mixOmics for function arguments, which are not compulsory for a function to work? 2) Is there any reason that prevents this functionality (the default values, such as keepX = NULL) from being implemented in mixOmics?

aljabadi commented 5 years ago

As you have noticed our internal checker does check for that but we'll certainly consider your suggestions. In the future, feel free to submit a pull request for such enhancements that you have come up with the solution and we'll certainly review it.

GegznaV commented 5 years ago

I have two ideas, how this issue could be fixed. Just current version of mixOmics, that I forked from the main branch, does not pass checking on my PC. So it's not worth doing any modifications before the current issue is solved.

GegznaV commented 5 years ago

Some messages concerning the error. The output is truncated:

-- R CMD check results ------------------------------------- mixOmics 6.8.1 ----
Duration: 4m 21.2s

> checking examples ... ERROR
  Running examples in 'mixOmics-Ex.R' failed
  The error most likely occurred in:

  > base::assign(".ptime", proc.time(), pos = "CheckExEnv")
  > ### Name: block.spls
  > ### Title: N-integration and feature selection with sparse Projection to
  > ###   Latent Structures models (sPLS)
  > ### Aliases: block.spls
  > ### Keywords: regression multivariate
  > 
  > ### ** Examples
  > 
  > 
  > # Example with multi omics TCGA study
  > # -----------------------------
  > data("breast.TCGA")
  > # this is the X data as a list of mRNA and miRNA; the Y data set is a single data set of proteins
  > data = list(mrna = breast.TCGA$data.train$mrna, mirna = breast.TCGA$data.train$mirna)
  > # set up a full design where every block is connected
  > design = matrix(1, ncol = length(data), nrow = length(data),
  + dimnames = list(names(data), names(data)))
  > diag(design) =  0
  > design
        mrna mirna
  mrna     0     1
  mirna    1     0
  > # set number of component per data set

/ truncated /

      if (any(class.object %in% object.blocks)) {
          l = 1
          for (i in 1:(length(blocks) - 1)) {
              for (j in (i + 1):length(blocks)) {
                  M_block[[l]][abs(M_block[[l]]) < cutoff] = 0
                  res[paste("M", blocks[i], blocks[j], sep = "_")] = list(M_block[[l]])
                  l = l + 1
              }
          }
      }
      else {
          mat[abs(mat) < cutoff] = 0
          res$M = mat
      }
      res$cutoff = cutoff
      if (!is.null(save)) 
          dev.off()
      return(invisible(res))
  }
  <bytecode: 0x00000000162ed6d8>
  <environment: namespace:mixOmics>
   --- function search by body ---
  Function network in namespace mixOmics has this body.
   ----------- END OF FAILURE REPORT -------------- 
  Fatal error: length > 1 in coercion to logical

> checking installed package size ... NOTE
    installed size is  8.0Mb
    sub-directories of 1Mb or more:
      R      1.2Mb
      data   3.0Mb
      doc    3.2Mb

> checking top-level files ... NOTE
  Non-standard files/directories found at top level:
    'CODE_OF_CONDUCT.md' 'README.Rmd'

1 error x | 0 warnings v | 2 notes x
Error: R CMD check found ERRORs
Execution halted

Exited with status 1.

It would be handy to have #32 implemented for automatic checking.