kevinblighe / PCAtools

PCAtools: everything Principal Components Analysis
329 stars 67 forks source link

Warning: non-numeric variables will be coerced to numeric #62

Closed arunhpatil closed 1 year ago

arunhpatil commented 1 year ago

Hi @kevinblighe,

Thank you for the PCAtools. I get no error message, however, I wonder what happnes if variables are coerced to numeric before plotting eigencorplot. I get the warning message as shown at the bottom. Can you please describe how are the metadata variables correlated to PCs? This will be very helpful.

I have defined the following metada as shown below: meta$diagnosis <- factor(meta$diagnosis, levels = c("disease", "Control")) meta$read_depth <- as.numeric(meta$read_depth) meta$tissue<- factor(meta$tissue, levels = c("t1","t2", "t3","t4")) meta$Agegroup <- factor(meta$Agegroup, levels = c("Adolescent","Adult","Child","Fetal","Infant","Older")) meta$Sex <- factor(meta$Sex, levels = c("F", "M")) meta$Race <- factor(meta$Race, levels = c("R1","R2","R3"))

eigencorplot(p, metavars = c('disease','RIN','Age','read_depth','tissue','Sex','Race', 'Agegroup'))

Warning messages:
1: In eigencorplot(p, metavars = c("diagnosis", "RIN", "Age", "read_depth",  :
  diagnosis is not numeric - please check the source data as non-numeric variables will be coerced to numeric
2: In eigencorplot(p, metavars =  c("diagnosis", "RIN", "Age", "read_depth",  :
  tissue is not numeric - please check the source data as non-numeric variables will be coerced to numeric
3: In eigencorplot(p, metavars =  c("diagnosis", "RIN", "Age", "read_depth",  :
  Sex is not numeric - please check the source data as non-numeric variables will be coerced to numeric
4: In eigencorplot(p, metavars =  c("diagnosis", "RIN", "Age", "read_depth",  :
  Race is not numeric - please check the source data as non-numeric variables will be coerced to numeric
5: In eigencorplot(p, metavars =  c("diagnosis", "RIN", "Age", "read_depth",  :
  Agegroup is not numeric - please check the source data as non-numeric variables will be coerced to numeric

Thank you, Arun

kevinblighe commented 1 year ago

Hey @arunhpatil , any variable that is encoded as categorical (a 'factor') will be converted by PCAtools to numerical so that it can be correlated to the PCs. In such a case, the numbering will depend on the order of the factors. For example, this would be converted to 1,2,3,4 and 1,2

cancerstage <- factor(cancerstage, levels = c('Stage 1', 'Stage 2', 'Stage 3', 'Stage 4'))
caco <- factor(caco, levels = c('Control', 'Case'))

However, this would also be converted to 1,2,3,4, but perhaps loses biological meaning:

cancerstage <- factor(cancerstage, levels = c('Stage 1', 'Stage 4', 'Stage 2', 'Stage 3'))

Kind regards,

Kevin