lpantano / isomiRs

analyze isomiRs from seqbuster tool
http://lpantano.github.io/isomiRs/
MIT License
8 stars 3 forks source link

Mirtopfiles can not load with IsomirDataSeqFromMirtop #30

Open joshoandres13 opened 19 hours ago

joshoandres13 commented 19 hours ago

I'’m using the isomiRs Bioconductor package to analyze microRNAs. I’ve pre-processed the data with the nfcore-smrnaseq pipeline, and I’m working with the export/{sample.id}_mirtop_rawData.tsv files, which are described as compatible with isomiRs for isomiRs analysis. When I try to apply the IsomirDataSeqFromMirtop function, I have my data.frame of metadata ready and a list containing my sample files. However, I encounter the following error:

Error in h(simpleError(msg, call)):
  error in evaluating the argument 'x' in selecting a method for function 'unique': columnas no definidas seleccionadas.

It seems the issue is that each mirtop raw data file contains information for only one sample, whereas my metadata data.frame includes information for all samples. This mismatch appears to be causing the error. Does anyone know how I could integrate my sample files into a single data.frame that would be compatible with IsomirDataSeqFromMirtop? Should I merge the sample data manually, or is there a function within the package that can consolidate multiple samples into the right format? I’ve checked the documentation but couldn’t find a clear solution. Thanks in advance for any guidance!

My code :

# Load metadata
metadata <- read_csv("~/Documentos/TFM/mirna_analysis/input/20240813_metadata.csv")

# Delete the columns that not correspond with the samples 
metadata <- metadata[-c(1:79),]

# Convert the prefix column in the row names
metadata <- column_to_rownames(metadata, var = "prefix")

# Transform to data frame
metadata_df <- as.data.frame(metadata)
class(metadata_df)

print(metadata)
str(metadata)

# Load count matrix
# directory that contain  the files .tsv of every sample  
path <- "/media/joshoacr13/EXTERNAL_USB/mirna/nfcore-smrnaseq/mirtop_AllSamples/export/"

# Create a list with .tsv files 
file_list <- list.files(path, pattern = "*.tsv", full.names = TRUE)

# So here when a apply this funcion have the error

ids <- IsomirDataSeqFromMirtop(mirtop = read_tsv(file_list), coldata = metadata)
So here i do the following function and obtain a data frame.
# Read and combine all the file in a data.frame

all_samples <- lapply(file_list, function(file) {
  # Leer el archivo de muestra
  sample_data <- read.delim(file, sep = "\t")

  sample_name <- gsub("_mirtop_rawData.tsv", "", basename(file))

  colnames(sample_data)[ncol(sample_data)] <- sample_name

  return(sample_data)
})

merged_data <- Reduce(function(x, y) merge(x, y, by = c("seq", "mir", "mism", "add", "t5", "t3"), all = TRUE), all_samples)

merged_data[is.na(merged_data)] <- 0
IsomirDataSeq object
# Usar `IsomirDataSeqFromMirtop`
ids <- IsomirDataSeqFromMirtop(mirtop = merged_data, coldata = metadata_matrix)

print(ids)
lpantano commented 8 hours ago

Hi,

Thank you for reporting this. I think I understand what is happening. I think the idea of merging you first should work. Is that giving you an error, when you merge by yourself?

Mirtop will generate a table with all the samples if it is used with all the samples, but right now nf-core run individual samples. Probably we should add this to the pipeline, if that is working we can plan in adding that part.

Let me know