multithreading and speed

davidaray commented 2 years ago

I was re-reading the page today and found this:

--num_threads: The number of processors to use when running TE Density. For maximum speed, use 1:1 ratio of processors to pseudomolecules. TE density is only calculated between genes and TEs belonging to the same pseudomolecule (chromosomes for chromosome-scale assemblies).

I have several assemblies to process and the number of pseudomolecules ranges from 29 - 202.

I have two potential partitions to use on our HPCC. The first (nocona) can use as many as 128 processors with 512 Gb RAM. The drawback with nocona is that the wall time is insanely locked to only 48 hours of run time.

The second (xlquanah) has a run time of up to 2 weeks but the maximum number of processors I can use is 36 with 256 Gb RAM.

I have been running the hg dataset and it's on day 5 using 36 processors. I'd hoped that adding extra processors would speed things up but, based on my reading of Table 1, that's not happened.

Given the information above, how would you suggest I set up my runs to maximize speed?

If it helps, my submission (for xlquanah) script is below. I'm also attempting runs using nocona and 128 processors.

. ~/conda/etc/profile.d/conda.sh
conda activate tedensity
cd /lustre/work/daray/software/TE_Density
source tedensity-virt/bin/activate

## Creates variables used in this script
# Genome ID and directory name
GENOME=mMyo
RUNTYPE=${GENOME}_tedensity2
# Working, data, and software directories
DIR=/lustre/scratch/daray/tedensity/$RUNTYPE
TEDATA=mMyo_TEs_CLEAN_filtered.tsv
GENEDATA=mMyo_transcripts_CLEAN.tsv
OUTPUT_DIR=$DIR
PROGRAMDIR=/lustre/work/daray/software/TE_Density

## Create working directory
mkdir -p $DIR
cd $DIR

## Run it
python $PROGRAMDIR/process_genome.py \
        $GENEDATA \
        $TEDATA \
        $GENOME \
        -c $PROGRAMDIR/config/production_run_config.ini \
        -n 36 \
        -o $DIR

sjteresi commented 2 years ago

Hello David,

I would suggest using the second option of xlquanah. That way you give it the largest time window to complete and it has a decent amount of RAM (that amount of RAM seems sufficient, and RAM has been the most common issue in my experience) The program (progress bar) stalls if you don't have enough RAM. Unfortunately, it will probably take longer because you have less processors, but this way it should work without failing and you won't have to run it again. You only need to get it to run successfully once right? I don't think the nocona is an option unless you have a very small genome or a very small amount of unique TE groupings, the time just doesn't seem sufficient despite the large amount of resources available.

The only other way to speed up your calculations is to minimize the number of unique TE order or superfamily groupings. Referencing the Performance subsection of the Implementation section of the publication, I would suggest re-categorizing as many redundant TE groupings as possible to reduce the number of calculations. This would mean reformatting your cleaned TE data input file.

sjteresi commented 2 years ago

Long story short, the main limitations are time and RAM. You know you don't have enough RAM if the program stalls and the progress bar doesn't increment after very long periods. RAM has been the biggest issue for me. You can reduce computation time by reducing the number of unique TE groupings. In the publication the reduced Human dataset took the longest because it had an abnormal amount of unique TE groupings, despite being one of the smaller genomes. By all means try to throw as many processors as you can at it, but like I said before, RAM was the main issue for me.

I hope this was helpful and not too repetitive. Please let me know what works for you so I can maybe include some better usage notes in the README!

sjteresi / TE_Density

multithreading and speed #115