Open brettyout opened 3 years ago
hello, That is strange. I do not think it has to do with cray-mpich. Is there a way to create a similar test case so that I can debug at my end? (I did read through the code, and I see no visible reason as to why the imbalance should happen)
Thank you, Raghavendra
Sure.
Below is a list of the ftp links for the genomes I used. After downloading them I had to rename the extension with the fasta.gz
.
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/003/668/045/GCF_003668045.3_CriGri-PICRH-1.0/GCF_003668045.3_CriGri-PICRH-1.0_genomic.fna.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/004/115/265/GCF_004115265.1_mRhiFer1_v1.p/GCF_004115265.1_mRhiFer1_v1.p_genomic.fna.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/004/664/715/GCF_004664715.2_UCI_PerLeu_2.1/GCF_004664715.2_UCI_PerLeu_2.1_genomic.fna.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/007/474/595/GCF_007474595.2_mLynCan4.pri.v2/GCF_007474595.2_mLynCan4.pri.v2_genomic.fna.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/762/305/GCF_009762305.2_mZalCal1.pri.v2/GCF_009762305.2_mZalCal1.pri.v2_genomic.fna.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/873/245/GCF_009873245.2_mBalMus1.pri.v3/GCF_009873245.2_mBalMus1.pri.v3_genomic.fna.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/014/108/235/GCF_014108235.1_mMyoMyo1.p/GCF_014108235.1_mMyoMyo1.p_genomic.fna.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/014/108/245/GCF_014108245.1_mPipKuh1.p/GCF_014108245.1_mPipKuh1.p_genomic.fna.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/014/108/415/GCF_014108415.1_mMolMol1.p/GCF_014108415.1_mMolMol1.p_genomic.fna.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/014/176/215/GCF_014176215.1_mRouAeg1.p/GCF_014176215.1_mRouAeg1.p_genomic.fna.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/014/570/535/GCF_014570535.1_YNU_ManJav_2.0/GCF_014570535.1_YNU_ManJav_2.0_genomic.fna.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/014/570/555/GCF_014570555.1_YNU_ManPten_2.0/GCF_014570555.1_YNU_ManPten_2.0_genomic.fna.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/014/633/375/GCF_014633375.1_OchPri4.0/GCF_014633375.1_OchPri4.0_genomic.fna.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/014/824/575/GCF_014824575.1_WHU_Shon_v2/GCF_014824575.1_WHU_Shon_v2_genomic.fna.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/014/825/515/GCF_014825515.1_WHU_Ajam_v2/GCF_014825515.1_WHU_Ajam_v2_genomic.fna.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/014/898/055/GCF_014898055.1_MPIMG_talOcc4/GCF_014898055.1_MPIMG_talOcc4_genomic.fna.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/015/220/235/GCF_015220235.1_mChoDid1.pri/GCF_015220235.1_mChoDid1.pri_genomic.fna.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/015/252/025/GCF_015252025.1_Vero_WHO_p1.0/GCF_015252025.1_Vero_WHO_p1.0_genomic.fna.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/903/995/425/GCF_903995425.1_mOncTor1.1/GCF_903995425.1_mOncTor1.1_genomic.fna.gz
Thanks for the help, Brett
Hello again, I'm getting pretty fairly long runtimes with the fasta_reader. Think it might have to do with tasks not being evenly allocated. Below is a run with ten processes. Every process but the last two finish almost immediately. The dataset is two reference genomes in FASTA format (3.1 MB) and the output k-mer files look fine.
This is the same pattern no matter how many processes I run. I also tried 68 processes on a 20 file/10 GB dataset, but once again only the last two ranks had any significant runtime.
One key difference you might have noticed is that I'm using srun. Due to some restrictions on my server, I built ctf/jaccard/fasta_reader with cray-mpich/7.7.10. The support specialists I've spoken with said srun should act the same as mpi-run under my current build. Regardless, do you think cray-mpich might have to do with fasta_reader limiting itself to two processes?
Thanks, Brett