vegandevs / vegan

R package for community ecologists: popular ordination methods, ecological null models & diversity analysis
https://vegandevs.github.io/vegan/
GNU General Public License v2.0
448 stars 97 forks source link

dbrda function does not let pseudocount through #631

Closed TuomasBorman closed 6 months ago

TuomasBorman commented 6 months ago

https://github.com/vegandevs/vegan/blob/a88584bf096e52ce9c291d0640bd4e4f9b8b364d/R/dbrda.R#L35

Hi,

it seems that pseudocount is not passed through from dbrda function. When aitchison distance is calculated, clr transform is applied. But since pseudocount is not going through, it leads to an error if there are non-positive values.

I believe replacing line

X <- dfun(X, distance)

with

X <- dfun(X, distance, ...)

should work but in this case, but I am not sure if it breaks something else.

-Tuomas

library(vegan)
Loading required package: permute
Loading required package: lattice
This is vegan 2.6-5

data(varespec)
data(varechem)

# Just arbitrary pseudocount
dbrda(varespec ~ N + P + K + Condition(Al), varechem, dist = "aitchison", pseudocount = 1)
Error: 'clr' cannot be used with non-positive data: use pseudocount > 0

sessionInfo()
R Under development (unstable) (2024-01-12 r85803)
Platform: x86_64-pc-linux-gnu
Running under: Linux Mint 21

Matrix products: default
BLAS:   /opt/R/devel/lib/R/lib/libRblas.so 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=fi_FI.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=fi_FI.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Helsinki
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] vegan_2.6-5                     lattice_0.22-5                  permute_0.9-7                   MultiAssayExperiment_1.29.0     TreeSummarizedExperiment_2.11.0
 [6] Biostrings_2.71.1               XVector_0.43.1                  SingleCellExperiment_1.25.0     SummarizedExperiment_1.33.2     Biobase_2.63.0                 
[11] GenomicRanges_1.55.1            GenomeInfoDb_1.39.5             IRanges_2.37.0                  S4Vectors_0.41.3                BiocGenerics_0.49.1            
[16] MatrixGenerics_1.15.0           matrixStats_1.2.0              

loaded via a namespace (and not attached):
 [1] circlize_0.4.15         shape_1.4.6             rjson_0.2.21            GlobalOptions_0.1.2     yulab.utils_0.1.3       vctrs_0.6.5             tools_4.4.0            
 [8] bitops_1.0-7            generics_0.1.3          parallel_4.4.0          tibble_3.2.1            fansi_1.0.6             cluster_2.1.6           pkgconfig_2.0.3        
[15] Matrix_1.6-5            RColorBrewer_1.1-3      lifecycle_1.0.4         GenomeInfoDbData_1.2.11 compiler_4.4.0          treeio_1.27.0           codetools_0.2-19       
[22] ComplexHeatmap_2.19.0   clue_0.3-65             lazyeval_0.2.2          RCurl_1.98-1.14         tidyr_1.3.0             pillar_1.9.0            crayon_1.5.2           
[29] MASS_7.3-60.2           BiocParallel_1.37.0     cachem_1.0.8            DelayedArray_0.29.0     iterators_1.0.14        abind_1.4-5             foreach_1.5.2          
[36] nlme_3.1-164            tidyselect_1.2.0        digest_0.6.34           purrr_1.0.2             dplyr_1.1.4             splines_4.4.0           fastmap_1.1.1          
[43] grid_4.4.0              colorspace_2.1-0        cli_3.6.2               SparseArray_1.3.2       magrittr_2.0.3          S4Arrays_1.3.1          utf8_1.2.4             
[50] ape_5.7-1               png_0.1-8               GetoptLong_1.0.5        memoise_2.0.1           doParallel_1.0.17       mgcv_1.9-1              rlang_1.1.3            
[57] Rcpp_1.0.12             glue_1.7.0              tidytree_0.4.6          jsonlite_1.8.8          rstudioapi_0.15.0       R6_2.5.1                fs_1.6.3               
[64] zlibbioc_1.49.0    
jarioksa commented 6 months ago

Confirmed. This seems to hit all cases where distance function needs extra parameters. For instance, dbrda(varespec ~ Al + P + K, data=varechem, distance="bray", binary = TRUE) also ignores binary argument. You suggested the obvious fix, but we need to check that it doesn't break anything else.

As a first fix, you can calculate the distances before dbrda:

d <- vegdist(varespec, "aitchison", pseudocount = 1)
dbrda(d ~ N + P + K + Condition(Al), varechem)
jarioksa commented 6 months ago

This is now fixed. The same problem also appeared in capscale and bioenv, but adonis2 and metaMDS were OK.

There could be a problem with mrpp, but it refuses all extra arguments (no dots in function call) and there you need to use dissimilarities as input if you need extra arguments. I won't change this.

There may also be a problem with avgdist, but it does not accept non-integer data. However, other arguments may be useful, and I will still see this.

TuomasBorman commented 6 months ago

Cool, thanks!