Question about sample sheet contents and formatting

felixcactus commented 4 years ago

Hi, I have a question about formatting the input files. I'm using the load_rcc() function and am not clear on the required structure and contents of the sample sheet. I have all my RCC files together in one folder and that seems to be right. For a sample sheet, right now csv I imported with just a single column of RCC file names and "IDFILE" as the header. I'm looking at the GSE74821 example and see a ton more columns. What is the minimum information needed in the sample sheet? As it is, when I run the load_rcc function with my one column sample sheet I get the following error message:

#> Error in switch(EXPR = class(ssheet_csv), data.frame = ssheet_csv, character = utils::read.csv(file = ssheet_csv,  : EXPR must be a length 1 vector

here's what I did below:

mytargets <- read_csv("mytargets.csv")
#> Error in read_csv("mytargets.csv"): could not find function "read_csv"

mydata <- load_rcc(
  data_directory = "pathname", # Where the data is
  ssheet_csv = mytargets, #This is just a list of file names under column name "IDFILE"
  id_colname = "IDFILE", # Name of the column that contains the unique identfiers
  housekeeping_genes = NULL, # not sure where this fits in. WOuld this list of housekeeping genes be in the sample sheet somehow?
  housekeeping_predict = TRUE, # Whether or not to predict the housekeeping genes
  normalisation_method = "GEO", # Geometric mean or GLM
  n_comp = 5 # Number indicating how many principal components should be computed. 
)

Thank you very much!

mcanouil commented 4 years ago

Hi,

The issue comes from a change in R 4.0.0, which apparently you are using. I’ll try to fix it and update CRAN soon. Meantime, your code should work with a slight change to avoid mytargets to have multiple classes (here tibble and data.frame), i.e., use as.data.frame(mytargets).

Regarding the sample sheet, the minimal requirements is one column. The one used for id_colname, all the other columns are just some possibly useful variables related to the dataset (e.g., phenotypes to use latter in the analyses).

mcanouil commented 4 years ago

Fixed in 4255e1b7ae44ecb36818d13bb5569c883bc3f833

mcanouil commented 4 years ago

The issue is fixed in version 1.0.1. #24 The release is currently being built on CRAN and will be available soon on all platforms.

mcanouil / NACHO

Question about sample sheet contents and formatting #23