mehrnoushmalek / flowDensity

Automated gating
7 stars 3 forks source link

deGate() does not return a gate between the peaks found by getPeaks() when using twin.factor #8

Open saheedb opened 1 year ago

saheedb commented 1 year ago

I'm wondering if this is expected behavior. Example:

library(flowDensity)

set.seed(123)
dat <- matrix(c(rnorm(3000, mean = 5, sd = 0.5), 
                rnorm(3000, mean = 7, sd = 0.25),
                rnorm(1000, mean = 8, sd = 0.25)))
colnames(dat) = "d"
ff <- flowFrame(dat)

peaks <- getPeaks(ff, "d", twin.factor = 0.5)
gate <- deGate(ff, "d", twin.factor = 0.5)

peaks

The first peak is ignored as expected

$Peaks
[1] 7.013470 7.961586

$P.ind
[1] 325 400

$P.h
[1] 0.5534271 0.1867415

But the gate from deGate() is not between the two peaks from getPeaks() as if twin.factor did not have an effect

gate

[1] 6.153844

Plot

library(ggplot2)
library(ggcyto)

autoplot(ff, "d") + 
  geom_vline(xintercept = peaks$Peaks, color = "red") +
  geom_vline(xintercept = gate, color = "blue")

degate_getpeaks

Is this expected behavior? Thanks.

sessionInfo()

R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.7 (Maipo)

Matrix products: default
BLAS:   /data/apps/R/4.0.3/lib64/R/lib/libRblas.so
LAPACK: /data/apps/R/4.0.3/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggcyto_1.18.0            flowWorkspace_4.5.3      ncdfFlow_2.36.0         
[4] BH_1.75.0-0              RcppArmadillo_0.10.1.2.0 ggplot2_3.3.3           
[7] flowDensity_1.24.0       flowCore_2.5.0          

loaded via a namespace (and not attached):
 [1] bitops_1.0-6            matrixStats_0.61.0-9001 RColorBrewer_1.1-2     
 [4] httr_1.4.2              Rgraphviz_2.34.0        Rwave_2.4-8            
 [7] RFOC_3.4-6              tools_4.0.3             R6_2.5.0               
[10] KernSmooth_2.23-17      rgeos_0.5-5             BiocGenerics_0.36.1    
[13] colorspace_2.0-0        withr_2.3.0             sp_1.4-4               
[16] gridExtra_2.3           tidyselect_1.1.0        splancs_2.01-40        
[19] curl_4.3.2              compiler_4.0.3          RPMG_2.2-3             
[22] graph_1.68.0            cli_2.2.0               Biobase_2.50.0         
[25] xml2_1.3.2              labeling_0.4.2          caTools_1.18.0         
[28] scales_1.1.1            hexbin_1.28.1           digest_0.6.28          
[31] foreign_0.8-80          rio_0.5.16              base64enc_0.1-3        
[34] jpeg_0.1-8.1            pkgconfig_2.0.3         maps_3.3.0             
[37] rlang_0.4.10            RSEIS_3.9-3             readxl_1.3.1           
[40] rstudioapi_0.13         farver_2.0.3            generics_0.1.0         
[43] gtools_3.8.2            dplyr_1.0.2             zip_2.1.1              
[46] car_3.0-10              magrittr_2.0.1          dotCall64_1.0-0        
[49] RProtoBufLib_2.5.1      Rcpp_1.0.5              munsell_0.5.0          
[52] S4Vectors_0.28.1        fansi_0.4.1             abind_1.4-5            
[55] lifecycle_0.2.0         GEOmap_2.4-4            stringi_1.5.3          
[58] carData_3.0-4           MASS_7.3-53             zlibbioc_1.36.0        
[61] plyr_1.8.6              gplots_3.1.1            grid_4.0.3             
[64] parallel_4.0.3          forcats_0.5.0           crayon_1.3.4           
[67] lattice_0.20-41         haven_2.3.1             hms_0.5.3              
[70] pillar_1.4.7            stats4_4.0.3            XML_3.99-0.5           
[73] glue_1.4.2              latticeExtra_0.6-29     data.table_1.13.6      
[76] MBA_0.0-9               RcppParallel_5.0.2      png_0.1-7              
[79] vctrs_0.3.6             spam_2.6-0              cellranger_1.1.0       
[82] gtable_0.3.0            aws.s3_0.3.21           purrr_0.3.4            
[85] assertthat_0.2.1        openxlsx_4.2.3          IDPmisc_1.1.20         
[88] tibble_3.0.4            cytolib_2.5.3           aws.signature_0.6.0    
[91] flowViz_1.54.0          fields_11.6             ellipsis_0.3.1         
jmeskas commented 10 months ago

Hi @saheedb ,

I know it has been a while and maybe you already figured out your issue, but I saw post and figured I could help.

The twin.factor parameter is designed to disallow a valley result being returned if the valley's y value is very close to the y value of the peaks. This usually occurs when two peaks are about the same height and are next to each other. For example

dat <- matrix(c(rnorm(3000, mean = 7, sd = 0.4),
                rnorm(3000, mean = 8, sd = 0.4)))
colnames(dat) = "d"
ff <- flowFrame(dat)

gate1 <- deGate(ff, "d", twin.factor = 0.9)
gate2 <- deGate(ff, "d", twin.factor = 0.85)

library(ggplot2)
library(ggcyto)

autoplot(ff, "d") + 
  geom_vline(xintercept = peaks$Peaks, color = "red") +
  geom_vline(xintercept = gate1, color = "blue") + 
  geom_vline(xintercept = gate2, color = "green")

Screenshot from 2023-09-01 11-00-34

Please note that when twin.factor is reduced to 0.85 the valley is no longer accepted and a value related to the inflection point of the distribution is returned.

Regarding getPeaks, the twin.factor parameter is not commonly (possibly never) used for calculating peaks since we are looking for peaks and not valleys. And looking at the code, the part using twin.factor is implemented differently than the way it is implemented in deGate. I would advise not to use twin.factor in getPeaks. So your comment of "The first peak is ignored as expected", is actually not an expected result.

One tip is if you want to get both valleys in your example you can use the all.cuts parameter in deGate. For example,

library(flowDensity)
library(flowCore)

set.seed(123)
dat <- matrix(c(rnorm(3000, mean = 5, sd = 0.5), 
                rnorm(3000, mean = 7, sd = 0.25),
                rnorm(1000, mean = 8, sd = 0.25)))
colnames(dat) = "d"
ff <- flowFrame(dat)

peaks <- getPeaks(ff, "d", twin.factor = 0.5)
gates <- deGate(ff, "d", all.cuts=T)

gates
[1] 6.153844 7.670830

Hope this helps!