malucalle / selbal

selbal: selection of balances for microbial signatures
32 stars 15 forks source link

user_numVar doesn't take effects #30

Closed HuoJnx closed 1 year ago

HuoJnx commented 2 years ago

Basic info:

selbal version: 0.1.0 R version: 4.1.2 (2021-11-01) -- "Bird Hippie" Platform: Linux s5 4.19.0-14-amd64 #1 SMP Debian 4.19.171-2 (2021-01-30) x86_64 GNU/Linux

Issue description:

When setting the user_numVar, it only works in selbal_res$accuracy.nvar and selbal_res$opt.nvar, but not in the real balance selection.

Code

selbal_res$accuracy.nvar

image

selbal_res$opt.nvar

image
num_var=6
selbal_res=selbal.cv(x=df_X,y=Y,n.fold = 8,n.iter = 10,user_numVar=num_var)
grid.draw(selbal_res$global.plot)
image

plot.tab(selbal_res$cv.tab)

image

Additional info

I also try num_var=2, 3, 4, 6, 8, 10, 12. Only 2 and 3 took effect. When set to 2, the balance will include 2 variables. When set to 3, it will include 3 variables. But when more than 3, the number of variables selected will be fixed at 3.

user_numVar==2

image

user_numVar==3

image

user_numVar>3

image
HuoJnx commented 2 years ago

Update:

I also tried the same code in personal computer and got the same results.

Basic info:

selbal version: 0.1.0 R version: 4.1.1 (2021-08-10) -- "Kick Things" Platform: Darwin HEs-MacBook-Pro.local 21.6.0 Darwin Kernel Version 21.6.0: Mon Aug 22 20:20:05 PDT 2022; root:xnu-8020.140.49~2/RELEASE_ARM64_T8101 arm64

Code

selbal_res$accuracy.nvar

image
num_var=6
selbal_res=selbal.cv(x=df_X,y=Y,n.fold = 8,n.iter = 10,user_numVar=num_var)
grid.draw(selbal_res$global.plot)
image
Rivera5 commented 1 year ago

Hi @HuoJnx!

As we can see in selbal_res$accuracy.nvar, once we get a 3 component balance, the accuracy does not improve despite adding new variables. So, even if you order to the program to get a 5, 6, 7, component balance the function returns you the "optimal" one, which is the one satisfying the following restrictions:

In your case, according to selbal_res$accuracy.nvar, the first two conditions above select the balances with 3, 4, 5, 6, . . . variables. Nevertheless, is the third restriction, the simplest balance is the one with less number of varialbles, that is, a balance with 3 components.

I hope I have clarified a little the idea behind the balance selection, if not, let me know and we will try to clarify it

HuoJnx commented 1 year ago

Totally understood, thank you for your clarification. @Rivera5