Closed habibr closed 11 years ago
Is it reproducible ?
The number of MPI ranks (processor cores) is obtained by:
/** initialize the message passing interface stack */ MPI_Get_processor_name(serverName,&len); MPI_Comm_rank(MPI_COMM_WORLD,&m_rank); MPI_Comm_size(MPI_COMM_WORLD,&m_size);
in ray/RayPlatform/communication/MessagesHandler.cpp
What MPI library are you using ? Maybe your MPI library requires mpiexec -np 4 instead of -n 4.
Yes, it is reproducible. Here is the standard output snippets:
BEGIN Thu Aug 30 08:17:53 WIT 2012 running in /home/habib/Bioinformatics/runs/111220/VioletRay,chl_i3,k19 command= mpiexec -n 2 Ray -k 19 -p Sample/lane1_violetray_contigs_yes._1.fastq S ample/lane1_violetray_contigs_yes._2.fastq -o Assembly Rank 0: Rank= 0 Size= 1 ProcessIdentifier= 29741 ProcessorName= habib-desktop Rank 0: Rank= 0 Size= 1 ProcessIdentifier= 29742 ProcessorName= habib-desktop
...
MAXIMUM_MESSAGE_SIZE_IN_BYTES: 4000 bytes FORCE_PACKING = y ASSERT = n CONFIG_PROFILER_COLLECT = n CONFIG_CLOCK_GETTIME = n linux = yHAVE_LIBZ = n HAVE_LIBBZ2 = n CONFIG_PROFILER_COLLECT = _MSC_VER = n GNUC = y RAY_32_BITS = n CONFIG_CLOCK_GETTIME = n linux = y _MSC_VER = nn RAY_64_BITS = y MPI standard version: MPI 2. GNUC = y RAY_32_BITS = n RAY_64_BITS = 2 MPI library: MPICH2 1.4.1 Compiler: GNU gcc/g++ 4.6.3 y MPI standard version: MPI 2.2 MPI library: MPICH2 With SSE 4.2 With hardware pop count
...
Ray command: mpiexec -n 1 Ray \ mpiexec -n -k \ 1 19 \ Ray \ -p \ Sample/lane1_violetray_contigs_yes._1.fastq \ Sample/lane1_violetray_contigs_yes._2.fastq \ -o \ -k Assembly
Hello !
You compiled Ray with MPICH2 1.4.1p1, but I think you are using the mpiexec from Open-MPI.
If you compiled with MPICH2's mpicxx, you need to use MPICH2's mpiexec. Otherwise, you get undefined behavior.
Can you verify this ?
e.g.:
$ mpiexec --version
As a test, I compiled Ray with /usr/lib64/mpich2/bin/mpicxx instead of /usr/lib64/openmpi/bin/mpicxx.
[boiseb01@ls30 ray]$ /usr/lib64/mpich2/mpiexec -n 1 ./Ray -version Ray version 2.1.0-devel License for Ray: GNU General Public License version 3 RayPlatform version: 1.1.0-devel License for RayPlatform: GNU Lesser General Public License version 3
MAXKMERLENGTH: 32 KMER_U64_ARRAY_SIZE: 1 Maximum coverage depth stored by CoverageDepth: 4294967295 MAXIMUM_MESSAGE_SIZE_IN_BYTES: 4000 bytes FORCE_PACKING = n ASSERT = n HAVE_LIBZ = n HAVE_LIBBZ2 = n CONFIG_PROFILER_COLLECT = n CONFIG_CLOCK_GETTIME = n linux = y _MSC_VER = n GNUC = y RAY_32_BITS = n RAY_64_BITS = y MPI standard version: MPI 2.2 MPI library: MPICH2 1.4.1p1 Compiler: GNU gcc/g++ 4.6.3 20120306 (Red Hat 4.6.3-2)
This reproduces your problem:
/usr/lib64/openmpi/bin/mpiexec -n 16 ./Ray -test-network-only -o Test33
This likely solves your problem:
/usr/lib64/mpich2/bin/mpiexec -n 16 ./Ray -test-network-only -o Test33
Sébastien
On 29/08/12 10:12 PM, Habib R wrote:
Yes, it is reproducible. Here is the standard output snippets:
BEGIN Thu Aug 30 08:17:53 WIT 2012 running in /home/habib/Bioinformatics/runs/111220/VioletRay,chl_i3,k19 command= mpiexec -n 2 Ray -k 19 -p Sample/lane1_violetray_contigs_yes._1.fastq S ample/lane1_violetray_contigs_yes._2.fastq -o Assembly Rank 0: Rank= 0 Size= 1 ProcessIdentifier= 29741 ProcessorName= habib-desktop Rank 0: Rank= 0 Size= 1 ProcessIdentifier= 29742 ProcessorName= habib-desktop
...
MAXIMUM/MESSAGE_SIZE_IN_BYTES: 4000 bytes FORCE_PACKING = y ASSERT = n CONFIG_PROFILER_COLLECT = n CONFIG_CLOCK_GETTIME = n linux/_ = yHAVE/LIBZ = n HAVE_LIBBZ2 = n CONFIG_PROFILER_COLLECT = _MSC_VER = n GNUC/_ = y RAY/32_BITS = n CONFIG_CLOCK_GETTIME = n linux/_ = y /MSC_VER = nn RAY_64_BITS = y MPI standard version: MPI 2. GNUC/_ = y RAY_32_BITS = n RAY_64_BITS = 2 MPI library: MPICH2 1.4.1 Compiler: GNU gcc/g++ 4.6.3 y MPI standard version: MPI 2.2 MPI library: MPICH2 With SSE 4.2 With hardware pop count
...
Ray command: mpiexec -n 1 Ray \ mpiexec -n -k \ 1 19 \ Ray \ -p \ Sample/lane1_violetray_contigs_yes._1.fastq \ Sample/lane1_violetray_contigs_yes._2.fastq \ -o \ -k Assembly
— Reply to this email directly or view it on GitHub https://github.com/sebhtml/ray/issues/81#issuecomment-8147383.
Hi Seb, thanks for the clue. It is verified. I found mpiexec from mpich2 and when I used it, it ran as should be.
You are not the first one ;-)
On 30/08/12 10:53 PM, Habib R wrote:
Hi Seb, thanks for the clue. It is verified. I found mpiexec from mpich2 and when I used it, it ran as should be.
— Reply to this email directly or view it on GitHub https://github.com/sebhtml/ray/issues/81#issuecomment-8181293.
Hi Seb,
been a while since last testing with ray. I tested latest ray (2.1.devel) on 8 million paired reads and issue a command utilising mpiexec -n 4 ... But the program only shows Rank 0 was working. The RayCommand.txt recorded mpiexec -n 1 too... Is it normal or is there something wrong?