kvittingseerup / IsoformSwitchAnalyzeR

An R package to Identify, Annoatate and Visialize Isoform Switches with Functional Consequences (from RNA-seq data)
96 stars 18 forks source link

IsoformSwitchAnalyzeR could not handle 394 wheat samples at one time #230

Closed sumageb closed 4 months ago

sumageb commented 4 months ago

Hello, I wanted to use the package to make a gene expression matrix from 394 wheat samples, but I could not. Every time, I receive an error message about being out of memory. I used a cluster computer for that and increased memory by about 1TB, but it failed. I used fastp - STAR - Stringtie - isoformSwitchAnalyzeR. Here is my code: stringQuant <- importIsoformExpression(parentDir="./",addIsofomIdAsColumn = F, readLength = 150)

Make design matrix

myDesign <- data.frame(sampleID = colnames(stringQuant$abundance),condition = gsub('_.*', '', colnames(stringQuant$abundance)))

Create switchAnalyzeRlist

aSwitchList <- importRdata(isoformCountMatrix = stringQuant$counts,isoformRepExpression = stringQuant$abundance,designMatrix= myDesign, isoformExonAnnoation ="./merged.gtf", fixStringTieAnnotationProblem=T, fixStringTieViaOverlapInMultiGenes=T)

Abundance matrix

geneExpresionMatrix <- extractGeneExpression(aSwitchList, extractCounts= F, addGeneNames=T,addIdsAsColumns=F) save(aSwistringQuant <- importIsoformExpression(parentDir="./",addIsofomIdAsColumn = F, readLength = 150)

Make design matrix

myDesign <- data.frame(sampleID = colnames(stringQuant$abundance),condition = gsub('_.*', '', colnames(stringQuant$abundance)))

Create switchAnalyzeRlist

aSwitchList <- importRdata(isoformCountMatrix = stringQuant$counts,isoformRepExpression = stringQuant$abundance,designMatrix= myDesign, isoformExonAnnoation ="./merged.gtf", fixStringTieAnnotationProblem=T, fixStringTieViaOverlapInMultiGenes=T)

Abundance matrix

geneExpresionMatrix <- extractGeneExpression(aSwitchList, extractCounts= F, addGeneNames=T,addIdsAsColumns=F) save(aSwitchList, geneExpresionMatrix, file="expression.Rdata")

And, the error message was, 1820 gene_ids which were associated with multiple ref_gene_id/gene_names were split into mutliple genes via their ref_gene_id/gene_names. 58136 genes_id were assigned their original gene_id instead of the StringTie gene_id. This was only done when it could be done unambiguous. Step 4 of 7: Calculating gene expression and isoform fractions... Step 5 of 7: Merging gene and isoform expression... |======================================================================| 100% Step 6 of 7: Making comparisons... |===== | 8%/var/spool/slurmd/job11220936/slurm_script: line 8: 43269 Killed Rscript importing2.R slurmstepd: error: Detected 1 oom_kill event in StepId=11220936.batch. Some of the step tasks have been OOM Killed.

I appreciate your help.

chunxubioinfor commented 4 months ago

Hi Suma, this error indicates that your R script was killed by the operating system's Out of Memory (OOM) mechanism while running. I guess that's because you got so many samples, almost 400, which results in a dramatically massive usage of memory while making pairwise comparisons between all the samples. Unfortunately, I don't think I can offer you any solutions for this situation now. But I'd like to know how many conditions your samples and maybe you could cut down the number of samples in each condition or only select a few conditions of interest. 😊

sumageb commented 4 months ago

Hi Chunxubioinfor, I just wanted to let you know that I reduced the number of samples and found that the R package worked perfectly for up to 20 samples. However, my ultimate goal is to create a gene expression matrix in one go, as this package also performs normalization using another tool. Therefore, reducing the number of samples may not be the best choice in this case. Thank you!

chunxubioinfor commented 4 months ago

Hi Suma, I'm sorry that I guess I have no clue how to do that without reducing the sample size. If you just wanted to get an expression matrix rather than further perform isoform analysis, I recommend that maybe you could find other specialized tools to do that. So I just first close this issue. Good luck🤞