sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

Only 1 rank? #81

Closed habibr closed 11 years ago

habibr commented 11 years ago

Hi Seb,

been a while since last testing with ray. I tested latest ray (2.1.devel) on 8 million paired reads and issue a command utilising mpiexec -n 4 ... But the program only shows Rank 0 was working. The RayCommand.txt recorded mpiexec -n 1 too... Is it normal or is there something wrong?

sebhtml commented 11 years ago

Is it reproducible ?

The number of MPI ranks (processor cores) is obtained by:

        /** initialize the message passing interface stack */
        MPI_Get_processor_name(serverName,&len);
        MPI_Comm_rank(MPI_COMM_WORLD,&m_rank);
        MPI_Comm_size(MPI_COMM_WORLD,&m_size);

in ray/RayPlatform/communication/MessagesHandler.cpp

What MPI library are you using ? Maybe your MPI library requires mpiexec -np 4 instead of -n 4.

habibr commented 11 years ago

Yes, it is reproducible. Here is the standard output snippets:

BEGIN Thu Aug 30 08:17:53 WIT 2012 running in /home/habib/Bioinformatics/runs/111220/VioletRay,chl_i3,k19 command= mpiexec -n 2 Ray -k 19 -p Sample/lane1_violetray_contigs_yes._1.fastq S ample/lane1_violetray_contigs_yes._2.fastq -o Assembly Rank 0: Rank= 0 Size= 1 ProcessIdentifier= 29741 ProcessorName= habib-desktop Rank 0: Rank= 0 Size= 1 ProcessIdentifier= 29742 ProcessorName= habib-desktop

...

MAXIMUM_MESSAGE_SIZE_IN_BYTES: 4000 bytes FORCE_PACKING = y ASSERT = n CONFIG_PROFILER_COLLECT = n CONFIG_CLOCK_GETTIME = n linux = yHAVE_LIBZ = n HAVE_LIBBZ2 = n CONFIG_PROFILER_COLLECT = _MSC_VER = n GNUC = y RAY_32_BITS = n CONFIG_CLOCK_GETTIME = n linux = y _MSC_VER = nn RAY_64_BITS = y MPI standard version: MPI 2. GNUC = y RAY_32_BITS = n RAY_64_BITS = 2 MPI library: MPICH2 1.4.1 Compiler: GNU gcc/g++ 4.6.3 y MPI standard version: MPI 2.2 MPI library: MPICH2 With SSE 4.2 With hardware pop count

...

Ray command: mpiexec -n 1 Ray \ mpiexec -n -k \ 1 19 \ Ray \ -p \ Sample/lane1_violetray_contigs_yes._1.fastq \ Sample/lane1_violetray_contigs_yes._2.fastq \ -o \ -k Assembly

sebhtml commented 11 years ago

Hello !

You compiled Ray with MPICH2 1.4.1p1, but I think you are using the mpiexec from Open-MPI.

If you compiled with MPICH2's mpicxx, you need to use MPICH2's mpiexec. Otherwise, you get undefined behavior.

Can you verify this ?

e.g.:

$ mpiexec --version

As a test, I compiled Ray with /usr/lib64/mpich2/bin/mpicxx instead of /usr/lib64/openmpi/bin/mpicxx.

[boiseb01@ls30 ray]$ /usr/lib64/mpich2/mpiexec -n 1 ./Ray -version Ray version 2.1.0-devel License for Ray: GNU General Public License version 3 RayPlatform version: 1.1.0-devel License for RayPlatform: GNU Lesser General Public License version 3

MAXKMERLENGTH: 32 KMER_U64_ARRAY_SIZE: 1 Maximum coverage depth stored by CoverageDepth: 4294967295 MAXIMUM_MESSAGE_SIZE_IN_BYTES: 4000 bytes FORCE_PACKING = n ASSERT = n HAVE_LIBZ = n HAVE_LIBBZ2 = n CONFIG_PROFILER_COLLECT = n CONFIG_CLOCK_GETTIME = n linux = y _MSC_VER = n GNUC = y RAY_32_BITS = n RAY_64_BITS = y MPI standard version: MPI 2.2 MPI library: MPICH2 1.4.1p1 Compiler: GNU gcc/g++ 4.6.3 20120306 (Red Hat 4.6.3-2)

This reproduces your problem:

/usr/lib64/openmpi/bin/mpiexec -n 16 ./Ray -test-network-only -o Test33

This likely solves your problem:

/usr/lib64/mpich2/bin/mpiexec -n 16 ./Ray -test-network-only -o Test33

 Sébastien

On 29/08/12 10:12 PM, Habib R wrote:

Yes, it is reproducible. Here is the standard output snippets:

BEGIN Thu Aug 30 08:17:53 WIT 2012 running in /home/habib/Bioinformatics/runs/111220/VioletRay,chl_i3,k19 command= mpiexec -n 2 Ray -k 19 -p Sample/lane1_violetray_contigs_yes._1.fastq S ample/lane1_violetray_contigs_yes._2.fastq -o Assembly Rank 0: Rank= 0 Size= 1 ProcessIdentifier= 29741 ProcessorName= habib-desktop Rank 0: Rank= 0 Size= 1 ProcessIdentifier= 29742 ProcessorName= habib-desktop

...

MAXIMUM/MESSAGE_SIZE_IN_BYTES: 4000 bytes FORCE_PACKING = y ASSERT = n CONFIG_PROFILER_COLLECT = n CONFIG_CLOCK_GETTIME = n linux/_ = yHAVE/LIBZ = n HAVE_LIBBZ2 = n CONFIG_PROFILER_COLLECT = _MSC_VER = n GNUC/_ = y RAY/32_BITS = n CONFIG_CLOCK_GETTIME = n linux/_ = y /MSC_VER = nn RAY_64_BITS = y MPI standard version: MPI 2. GNUC/_ = y RAY_32_BITS = n RAY_64_BITS = 2 MPI library: MPICH2 1.4.1 Compiler: GNU gcc/g++ 4.6.3 y MPI standard version: MPI 2.2 MPI library: MPICH2 With SSE 4.2 With hardware pop count

...

Ray command: mpiexec -n 1 Ray \ mpiexec -n -k \ 1 19 \ Ray \ -p \ Sample/lane1_violetray_contigs_yes._1.fastq \ Sample/lane1_violetray_contigs_yes._2.fastq \ -o \ -k Assembly

— Reply to this email directly or view it on GitHub https://github.com/sebhtml/ray/issues/81#issuecomment-8147383.

habibr commented 11 years ago

Hi Seb, thanks for the clue. It is verified. I found mpiexec from mpich2 and when I used it, it ran as should be.

sebhtml commented 11 years ago

You are not the first one ;-)

On 30/08/12 10:53 PM, Habib R wrote:

Hi Seb, thanks for the clue. It is verified. I found mpiexec from mpich2 and when I used it, it ran as should be.

— Reply to this email directly or view it on GitHub https://github.com/sebhtml/ray/issues/81#issuecomment-8181293.