malucalle / selbal

selbal: selection of balances for microbial signatures
32 stars 15 forks source link

Using a continuous variable as y and a dichotomic covariate as z #15

Open AlessandroLazzaro opened 4 years ago

AlessandroLazzaro commented 4 years ago

Hello, Ia m having issues trying to use a continuous variable as y and a dichotomic covariate as z. I imported an excel dataset with only only the taxa of interest (only taxa present in more than 20% of the samples have been included), a column for a continuous variable (y) and another column containing dichotomic values, to use as covariate (z), as you can see below:

x <- DATA[,1:121] y_character <- DATA[['CD4+_count']] y <- as.vector(y_character) z <- data.frame(Cotrimoxazole_HIV = DATA[,129])

I don't have any problem when I run the command without covariate, in this way:

CV.BAL.dic <- selbal.cv(x = x, y = y, covar = NULL, n.fold = 5, n.iter = 10, logit.acc = "AUC")

but if I try to add the covariate in this way

CV.BAL.dic <- selbal.cv(x = x, y = y, covar = z, n.fold = 5, n.iter = 10, logit.acc = "AUC")

I have the following error: Error in { : task 1 failed - "NA/NaN/Inf in 'y'"

I don't understand why selbal warns me about y, since it looks that y variable works very well when the z is NULL. Do you have any tips? Thank you in advance, Alessandro Lazzaro

UVic-omics commented 4 years ago

Hi @AlessandroLazzaro !

First of all thank you for using selbal. It is difficult to answer your question without working with the data. Nevertheless, here I give you some tips in order to find what is happening:

1) Do you have any problems if you run the example given in the vignette of the package?

x <- HIV[,1:60]
  y <- HIV[,62]  
  z <- HIV[,61]  
result <- selbal(x,y,covar = z)

2) Which are the class of your objects x, y , z?

Let see if we can solve the problem step by step

Best regards,

AlessandroLazzaro commented 4 years ago

Thank you for answering!

  1. If I copy and paste the command you provided, I have the following error appearing continuously for many times: Setting levels: control = 0, case = 1 Setting direction: controls < cases

  2. The output provived by the class() function are the following: x "tbl_df" "tbl" "data.frame" y "numeric" z "data.frame"

Thank you very much for your help! Alessandro Lazzaro

UVic-omics commented 4 years ago

The code I wrote at the previous message shouldn't give you any error, . . . Please, try this one ( I use the sCD14 data set (numeric response) and I build a factor type covariate). It should work

data("sCD14")

  x <- sCD14[,1:60]
  y <- sCD14[,61]  
  z <- as.factor(c(rep(0,75),rep(1,76)))  
r <- selbal(x,y,covar = z)

If it works, try to use z as a vector of class factor, not as a data.frame.