malucalle / selbal

selbal: selection of balances for microbial signatures
32 stars 15 forks source link

Error with more than 2 variables in the balance #36

Open ChVav opened 7 months ago

ChVav commented 7 months ago

Dear selbal developers,

when I run selbal.cv() on my datasets, it runs fine only when 2 variables are selected in the balance. Once I run the same code forcing to select more variables (with opt.cri or opt.nvar parameters), I get the following error after the cross-validation procedure finished (and the optimal number of variables is printed) : Error in [.data.frame(LogCounts, , c(POS, x)) : undefined columns selected

I am not sure whether this also happens for >2 variables selected using default settings, because for my data coincidently always only 2 variables are then selected.

Many thanks in advance for your help!

ChVav commented 7 months ago

Created a pull requests with fixes that worked for my datasets. "Small fixes to selbal() and selbal.aux() ensuring variables are drawn from logCounts." Cheers!

UVic-omics commented 6 months ago

Thank you for your support @ChVav! I saw your proposals for changes, but I really don't konw how to accept them, I am not familiar with GitHub, the only thing I know is to correct it manually.

Do you know how can I accept your proposals?

ChVav commented 6 months ago

No problem! Here is described how to merge a pull request: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/incorporating-changes-from-a-pull-request/merging-a-pull-request-with-a-merge-queue. If you still have a local repository with your code, you can then do a "git pull" to ensure changes in the remote github repo are copied over locally. Hope this helps!

bl6594 commented 6 months ago

I downloaded the code modified by ChVav, but I am still receiving errors while trying it on the HIV data file. I am using R 4.3.3 on Windows 10. I should mentioned that the code doesn't work with the library neither.

my code:

source("C:/Users/xxx/OneDrive -xxx/Methods/selbal/selbal_functions.R")

Define x, y and z

x <- selbal::HIV[,1:60] y <- selbal::HIV[,62] z <- data.frame(MSM = selbal::HIV[,61])

Run selbal.cv function (with the default values for zero.rep and opt.cri)

CV.BAL.dic <- selbal.cv(x = x, y = y, n.fold = 5, n.iter = 10, covar = z, logit.acc = "AUC")

############################################################### STARTING selbal.cv FUNCTION ###############################################################

-------------------------------------------------------------

ZERO REPLACEMENT . . .

Loading required package: MASS Loading required package: NADA Loading required package: survival

Attaching package: ‘NADA’

The following object is masked from ‘package:stats’:

cor

Loading required package: truncnorm

, . . . FINISHED.

-------------------------------------------------------------

-------------------------------------------------------------

Starting the cross - validation procedure . . .

. . . finished.

-------------------------------------------------------------

###############################################################

The optimal number of variables is: 4

Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 't': undefined columns selected In addition: Warning messages: 1: In cmultRepl(x, suppress.print = T) : Column no. 49 containing >80% zeros/unobserved values deleted (see arguments z.warning and z.delete). Column no. 53 containing >80% zeros/unobserved values deleted (see arguments z.warning and z.delete).

2: In e$fun(obj, substitute(ex), parent.frame(), e$data) : already exporting variable(s): logit.acc 3: In cmultRepl(x, suppress.print = T) : Column no. 49 containing >80% zeros/unobserved values deleted (see arguments z.warning and z.delete). Column no. 53 containing >80% zeros/unobserved values deleted (see arguments z.warning and z.delete).

bl6594 commented 6 months ago

I've found that the change below can fix the problem, and it is on line 1357 in selbal_functions.R posted by ChVav. var.nam <- rem.nam <- colnames(x) to var.nam <- rem.nam <- colnames(logCounts)

UVic-omics commented 5 months ago

Thank you @bl6594! Change done