ldg21 / SGSeq

Splice event prediction and quantification from RNA-seq data
5 stars 1 forks source link

analyzeFeatures() slow (several days on multicore) #6

Open millerh1 opened 4 years ago

millerh1 commented 4 years ago

Hello,

I have been using analyzeFeatures() the same way the vignette shows.

txf_ucsc <- convertToTxFeatures(TxDb.Hsapiens.UCSC.hg38.knownGene)
txf_ucsc <- keepStandardChromosomes(txf_ucsc, pruning.mode = "coarse")
sgfc <- analyzeFeatures(si, which = txf_ucsc, cores = 15)

This has been running for several days with no end in sight. I'm still on the predict features... step. Is this the intended behavior given that bam files are only about ~3-4GB in size?

Here is my si:

image

seifudd commented 1 year ago

@millerh1 How did you circumvent this issue? Were you able to run this genome-wide on all your sample? I'm having a lot of memory issues & allocate vector of size 6.3 Gb errors. Running with 1 core. Anything more than that results in a memory issue. I have ~300G memory and 96 vCPUs on an EC2 instance. Thanks for any help.