rgcca-factory / RGCCA

https://rgcca-factory.github.io/RGCCA/
10 stars 11 forks source link

Forbid a_k to be set to a_1 as function is not define in a_1. #57

Closed AGloaguen closed 1 year ago

AGloaguen commented 1 year ago
rm(list = ls())

library(microbenchmark)
#> Warning: package 'microbenchmark' was built under R version 4.1.3

# my_median -> function currently in proj_l1_l2

########################### GOAL ########################### 
# Compare "my_median" with "quantile" function, when "type=1" should 
# give same results
my_median = function(x){
  N <- length(x)
  if (N == 0) {
    warning("length(p_MAX_rmv) = 0")
    break
  }
  # Choose next a_k
  if (N %% 2 == 0) {
    # Either min or max is possible
    p_reduced <- x[-which.max(x)]
    # Replace by ccaPP::fastMedian or any homemade cpp median function
    a_k <- median(p_reduced)
  } else {
    # Replace by ccaPP::fastMedian or any homemade cpp median function
    a_k <- median(x)
  }
  return(a_k)
}

########################### EVEN CASE ########################### 
n = 1e7

x = runif(n = n)
# Check if identical for one example
quantile(x = x, probs = 0.5, type=1) - my_median(x)
#> 50% 
#>   0

x <- 0
y <- 0

bm <- microbenchmark(
  my_median = {
    x <- x + 1
    set.seed(x)
    z = runif(n = n)
    my_median(z)
  },
  R_quantile = {
    y <- y + 1
    set.seed(y)
    z = runif(n = n)
    quantile(x = z, probs = 0.5, type=1)
  },
  times = 100)

print(bm)
#> Unit: milliseconds
#>        expr      min       lq     mean   median       uq      max neval cld
#>   my_median 584.1689 640.5465 666.7143 663.9883 688.9393 790.8789   100   b
#>  R_quantile 538.6439 594.5458 626.8911 625.5709 651.5362 905.6070   100  a

########################### ODD CASE ########################### 
n = n+1

x = runif(n = n)
# Check if identical for one example
quantile(x = x, probs = 0.5, type=1) - my_median(x)
#> 50% 
#>   0

x <- 0
y <- 0

bm <- microbenchmark(
  my_median = {
    x <- x + 1
    set.seed(x)
    z = runif(n = n)
    my_median(z)
  },
  R_quantile = {
    y <- y + 1
    set.seed(y)
    z = runif(n = n)
    quantile(x = z, probs = 0.5, type=1)
  },
  times = 100)

print(bm)
#> Unit: milliseconds
#>        expr      min       lq     mean   median       uq      max neval cld
#>   my_median 474.7853 523.6168 561.8754 550.4027 590.2881 694.2172   100  a 
#>  R_quantile 528.1196 596.5983 640.5559 627.7110 685.6903 793.5746   100   b

Created on 2023-01-31 with reprex v2.0.2

AGloaguen commented 1 year ago

Du coup je propose de laisser tel quel et de ne pas passer par "quantile", vous en pensez quoi ?

Pour info :

Types

"quantile" returns estimates of underlying distribution quantiles based on one or two order statistics from the 
supplied elements in x at probabilities in probs. One of the nine quantile algorithms discussed in Hyndman 
and Fan (1996), selected by type, is employed.

All sample quantiles are defined as weighted averages of consecutive order statistics. Sample quantiles of 
type i are defined by:

                                      Q[i](p)` = (1 - γ) x[j] + γ x[j+1],

where 1 ≤ i ≤ 9, (j-m)/n ≤ p < (j-m+1)/n, x[j] is the jth order statistic, n is the sample size, the value of 
γ is a function of j = floor(np + m) and g = np + m - j, and m is a constant determined by the sample 
quantile type.

**Discontinuous sample quantile types 1, 2, and 3**

For types 1, 2 and 3, Q[i](p) is a discontinuous function of p, with m = 0 when i = 1 and i = 2, 
and m = -1/2 when i = 3.

Type 1

        Inverse of empirical distribution function. γ = 0 if g = 0, and 1 otherwise.

Donc là normalement, comme le type est à 1, i=1 et m=0. Nous in s'intéresse à la médiane donc p=1/2. j = floor(np+m)=floor(n/2).

Si n pair : j = n/2 et g = np+m-j = 0 donc gamma = 0 d'après la définition => Qtype1 = x[n/2] (statistique de rang n/2).

Si n impair : j = (n-1)/2 et g = 1/2 (différent de 0) donc gamma =1 => Qtype1 = x[(n+1)/2]

GFabien commented 1 year ago

Je suis mitigé, je suis plutôt en faveur de ne pas recoder les fonctions de base mais ici :

Après d'un autre côté si un jour la fonction quantile est modifiée et va plus vite, on en bénéficiera gratuitement.

AGloaguen commented 1 year ago

Effectivement je suis d'accord avec toi, il faut être assez expert pour comprendre l'utilisation de "type=1" dans la fonction quantile. Après je peux le préciser en commentaire... Si on passe par cette fonction, on condense le code ci-dessous en 1 ligne :

    # Choose next a_k
    if (N %% 2 == 0) {

      # Either min or max is possible 
      p_reduced <- p_MAX_rmv[-which.max(p_MAX_rmv)]

      # Replace by ccaPP::fastMedian or any homemade cpp median function
      a_k <- median(p_reduced)
    } else {

      # Replace by ccaPP::fastMedian or any homemade cpp median function
      a_k <- median(p_MAX_rmv)
    }

Sinon pour info j'ai fait des petites modifs :

GFabien commented 1 year ago

Hello, je penche finalement vers l'utilisation de la fonction quantile avec un commentaire ! Après je pense que c'est bon pour merger @Tenenhaus