sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

Profile Ray during the scaffolding on mp2 #91

Closed sebhtml closed 11 years ago

sebhtml commented 11 years ago

All 512 processes but these finished:

Ray-Scalability-512-2012-10-07.4.1.111 [DEBUG] sending RAY_MPI_TAG_GET_READ_MARKERS source: 111 destination: 411 Ray-Scalability-512-2012-10-07.4.1.224 PositionInRead: 0 Ray-Scalability-512-2012-10-07.4.1.240 Rank 240: assembler memory usage: 1320040 KiB Ray-Scalability-512-2012-10-07.4.1.265 LINK01 73000265,R,1104000285,F,154 Ray-Scalability-512-2012-10-07.4.1.379 PositionInRead: 0

sebhtml commented 11 years ago

-rw-r--r-- 1 boisver1 corbeil 87G Oct 10 07:28 Ray-Scalability-512-2012-10-07.4.1.111 -rw-r--r-- 1 boisver1 corbeil 84M Oct 9 13:22 Ray-Scalability-512-2012-10-07.4.1.224 -rw-r--r-- 1 boisver1 corbeil 59M Oct 9 11:34 Ray-Scalability-512-2012-10-07.4.1.240 -rw-r--r-- 1 boisver1 corbeil 28M Oct 9 10:34 Ray-Scalability-512-2012-10-07.4.1.265 -rw-r--r-- 1 boisver1 corbeil 85M Oct 9 12:31 Ray-Scalability-512-2012-10-07.4.1.379

sebhtml commented 11 years ago

Still producing bytes:

[boisver1@ip03 African-Genome]$ ls -lh Ray-Scalability-512-2012-10-07.4.1.*|grep G -rw-r--r-- 1 boisver1 corbeil 88G Oct 10 07:31 Ray-Scalability-512-2012-10-07.4.1.111

sebhtml commented 11 years ago

[DEBUG] received response for RAY_MPI_TAG_GET_VERTEX_EDGES_COMPACT [DEBUG] starting fetcher with RAY_MPI_TAG_REQUEST_VERTEX_READS

then:

[DEBUG] sending RAY_MPI_TAG_HAS_PAIRED_READ source: 111 destination: 243 [DEBUG] received response for RAY_MPI_TAG_HAS_PAIRED_READ 243 [DEBUG] sending RAY_MPI_TAG_GET_READ_MATE source: 111 destination: 243 [DEBUG] received response to RAY_MPI_TAG_GET_READ_MATE source: 111 destination: 243 [DEBUG] sending RAY_MPI_TAG_GET_READ_MARKERS source: 111 destination: 242 [DEBUG] received response for RAY_MPI_TAG_GET_READ_MARKERS source: 111 destination: 242 [DEBUG] sending RAY_MPI_TAG_GET_COVERAGE_AND_DIRECTION source: 111 destination: 201 [DEBUG] received response to RAY_MPI_TAG_GET_COVERAGE_AND_DIRECTION source: 111 destination: 201 [DEBUG] reverse sending RAY_MPI_TAG_GET_COVERAGE_AND_DIRECTION source: 111 destination: 201 [DEBUG] received response to RAY_MPI_TAG_GET_COVERAGE_AND_DIRECTION source: 111 [DEBUG] completed a annotation

(repeated with different source and destination etc...)

So the problem is that the fetcher is stuck in something. But it is not an infinite loop because the process responds to other and progresses.

sebhtml commented 11 years ago

The job takes 25 minutes to load all its checkpoints (around 512 GiB of distributed data). I added a few more debug messages to figure out what's going on.

sebhtml commented 11 years ago

[DEBUG] m_readAnnotationId= 0 Count= 15515194 m_positionOnContig= 0

This is just a repeated k-mer...

[boisver1@ip03 African-Genome]$ qsub scripts/Scalability-512.sh 49049.mp2.m

sebhtml commented 11 years ago

a patch for human assembly, this should fix the speed for SRA000271:

diff --git a/code/plugin_Scaffolder/Scaffolder.cpp b/code/plugin_Scaffolder/Scaffolder.cpp
index 1b8305c..6f3132e 100644
--- a/code/plugin_Scaffolder/Scaffolder.cpp
+++ b/code/plugin_Scaffolder/Scaffolder.cpp
@@ -41,13 +41,11 @@
 #include  /* for sqrt */

 __CreatePlugin(Scaffolder);
+__CreateMasterModeAdapter(Scaffolder,RAY_MASTER_MODE_WRITE_SCAFFOLDS);
+__CreateSlaveModeAdapter(Scaffolder,RAY_SLAVE_MODE_SCAFFOLDER);

- /**/
-__CreateMasterModeAdapter(Scaffolder,RAY_MASTER_MODE_WRITE_SCAFFOLDS); /**/
- /**/
-__CreateSlaveModeAdapter(Scaffolder,RAY_SLAVE_MODE_SCAFFOLDER); /**/
- /**/
- /**/
+#define __SCAFFOLDER_IGNORE_THRESHOLD 1024
+#define __BUG_5361

 using namespace std;

@@ -503,9 +501,9 @@ void Scaffolder::performSummary(){

    LargeCount sum=0;

-   int peakCoverage=getMode(&m_vertexCoverageValues);
+   CoverageDepth peakCoverage=getMode(&m_vertexCoverageValues);

-   int repeatCoverage=peakCoverage*REPEAT_MULTIPLIER;
+   CoverageDepth repeatCoverage=peakCoverage*REPEAT_MULTIPLIER;

    #ifdef CONFIG_USE_COVERAGE_DISTRIBUTION
    repeatCoverage=m_parameters->getRepeatCoverage();
@@ -697,6 +695,14 @@ void Scaffolder::processVertex(Kmer*vertex){
    //                  get the paths that goes on them
    //                  print the linking information
    if(!m_coverageRequested){
+
+       #ifdef __BUG_5361
+       if((*m_contigs)[m_contigId].size()==5361 && m_parameters->getRank()==111){
+           cout<<"[DEBUG] sending RAY_MPI_TAG_GET_VERTEX_EDGES_COMPACT source: 111";
+           cout<<" destination: "<_vertexRank(vertex)<allocate(1*sizeof(Kmer));
        int bufferPosition=0;
        vertex->pack(buffer,&bufferPosition);
@@ -705,6 +711,7 @@ void Scaffolder::processVertex(Kmer*vertex){
        m_virtualCommunicator->pushMessage(m_workerId,&aMessage);
        m_coverageRequested=true;
        m_coverageReceived=false;
+
        if(m_positionOnContig==0){
            m_scaffoldingSummary.clear();
            m_summaryPerformed=false;
@@ -733,6 +740,13 @@ void Scaffolder::processVertex(Kmer*vertex){
        }
    }else if(!m_coverageReceived
        &&m_virtualCommunicator->isMessageProcessed(m_workerId)){
+
+       #ifdef __BUG_5361
+       if((*m_contigs)[m_contigId].size()==5361 && m_parameters->getRank()==111){
+           cout<<"[DEBUG] received response for RAY_MPI_TAG_GET_VERTEX_EDGES_COMPACT"< elements;
        m_virtualCommunicator->getMessageResponseElements(m_workerId,&elements);

@@ -770,9 +784,26 @@ void Scaffolder::processVertex(Kmer*vertex){
        }

    }else if(m_coverageReceived){
-       /* anyway these entries will be checked after anyway... */
-       if(1 /*m_receivedCoveragegetRepeatCoverage()*/){
+
+/* 
+ * These entries will be checked after anyway.
+ * But still, we don't want to go through those
+ * repeats right now...
+ *
+ * TODO: the peak coverage of the contig should be used instead.
+ * The peak coverage of the distribution may not exist...
+ * Regardless, we need to do some filtering here to remove
+ * some.
+ */
+       if(m_receivedCoverage < __SCAFFOLDER_IGNORE_THRESHOLD*m_parameters->getPeakCoverage()){
            if(!m_initialisedFetcher){
+
+               #ifdef __BUG_5361
+               if((*m_contigs)[m_contigId].size()==5361 && m_parameters->getRank()==111){
+                   cout<<"[DEBUG] starting fetcher with RAY_MPI_TAG_REQUEST_VERTEX_READS"<getPositionOnStrand();

    if(!m_hasPairRequested){
+
+       #ifdef __BUG_5361
+       if((*m_contigs)[m_contigId].size()==5361 && m_parameters->getRank()==111){
+           cout<<"[DEBUG] sending RAY_MPI_TAG_HAS_PAIRED_READ source: 111";
+           cout<<" destination: "<getRank()==111){
+           cout<<"[DEBUG] has no pair ! "<pushMessage(m_workerId,&aMessage);
        m_pairRequested=true;
        m_pairReceived=false;
+
+       #ifdef __BUG_5361
+       if((*m_contigs)[m_contigId].size()==5361 && m_parameters->getRank()==111){
+           cout<<"[DEBUG] sending RAY_MPI_TAG_GET_READ_MATE source: 111";
+           cout<<" destination: "<isMessageProcessed(m_workerId)){
+
+       #ifdef __BUG_5361
+       if((*m_contigs)[m_contigId].size()==5361 && m_parameters->getRank()==111){
+           cout<<"[DEBUG] received response to RAY_MPI_TAG_GET_READ_MATE source: 111";
+           cout<<" destination: "< response;
        m_virtualCommunicator->getMessageResponseElements(m_workerId,&response);
        m_readLength=response[0];
@@ -867,8 +939,17 @@ void Scaffolder::processAnnotation(){
        m_pairReceived=true;
        m_markersRequested=false;
    }else if(!m_pairReceived){
+
        return;
    }else if(!m_markersRequested){
+
+       #ifdef __BUG_5361
+       if((*m_contigs)[m_contigId].size()==5361 && m_parameters->getRank()==111){
+           cout<<"[DEBUG] sending RAY_MPI_TAG_GET_READ_MARKERS source: 111";
+           cout<<" destination: "<allocate(1*sizeof(Kmer));
        buffer[0]=m_pairedReadIndex;
        Message aMessage(buffer,1,
@@ -878,6 +959,14 @@ void Scaffolder::processAnnotation(){
        m_markersReceived=false;
    }else if(!m_markersReceived
    &&m_virtualCommunicator->isMessageProcessed(m_workerId)){
+
+       #ifdef __BUG_5361
+       if((*m_contigs)[m_contigId].size()==5361 && m_parameters->getRank()==111){
+           cout<<"[DEBUG] received response for  RAY_MPI_TAG_GET_READ_MARKERS source: 111";
+           cout<<" destination: "< response;
        m_virtualCommunicator->getMessageResponseElements(m_workerId,&response);
        int bufferPosition=0;
@@ -909,6 +998,13 @@ void Scaffolder::processAnnotation(){

        int elementsPerQuery=m_virtualCommunicator->getElementsPerQuery(RAY_MPI_TAG_GET_COVERAGE_AND_DIRECTION);

+       #ifdef __BUG_5361
+       if((*m_contigs)[m_contigId].size()==5361 && m_parameters->getRank()==111){
+           cout<<"[DEBUG] sending RAY_MPI_TAG_GET_COVERAGE_AND_DIRECTION source: 111";
+           cout<<" destination: "<_vertexRank(&m_pairedForwardMarker)<_vertexRank(&m_pairedForwardMarker),
            RAY_MPI_TAG_GET_COVERAGE_AND_DIRECTION,m_parameters->getRank());
@@ -917,6 +1013,14 @@ void Scaffolder::processAnnotation(){
        m_forwardDirectionsReceived=false;
    }else if(!m_forwardDirectionsReceived
    &&m_virtualCommunicator->isMessageProcessed(m_workerId)){
+
+       #ifdef __BUG_5361
+       if((*m_contigs)[m_contigId].size()==5361 && m_parameters->getRank()==111){
+           cout<<"[DEBUG] received response to RAY_MPI_TAG_GET_COVERAGE_AND_DIRECTION source: 111";
+           cout<<" destination: "<_vertexRank(&m_pairedForwardMarker)< response;
        m_virtualCommunicator->getMessageResponseElements(m_workerId,&response);
        m_pairedForwardMarkerCoverage=response[0];
@@ -956,6 +1060,13 @@ void Scaffolder::processAnnotation(){
    }else if(!m_forwardDirectionsReceived){
        return;
    }else if(!m_forwardDirectionLengthRequested){
+
+       #ifdef __BUG_5361
+       if((*m_contigs)[m_contigId].size()==5361 && m_parameters->getRank()==111){
+           cout<<"[DEBUG] sending RAY_MPI_TAG_GET_PATH_LENGTH source: 111"<allocate(1*sizeof(Kmer));
        int rankId=getRankFromPathUniqueId(m_pairedForwardDirectionName);
        buffer[0]=m_pairedForwardDirectionName;
@@ -1126,6 +1237,13 @@ Case 13. (allowed)

        int elementsPerQuery=m_virtualCommunicator->getElementsPerQuery(RAY_MPI_TAG_GET_COVERAGE_AND_DIRECTION);

+       #ifdef __BUG_5361
+       if((*m_contigs)[m_contigId].size()==5361 && m_parameters->getRank()==111){
+           cout<<"[DEBUG] reverse sending RAY_MPI_TAG_GET_COVERAGE_AND_DIRECTION source: 111";
+           cout<<" destination: "<_vertexRank(&m_pairedForwardMarker)<_vertexRank(&m_pairedReverseMarker),
            RAY_MPI_TAG_GET_COVERAGE_AND_DIRECTION,m_parameters->getRank());
@@ -1134,6 +1252,13 @@ Case 13. (allowed)
        m_reverseDirectionsReceived=false;
    }else if(!m_reverseDirectionsReceived
    &&m_virtualCommunicator->isMessageProcessed(m_workerId)){
+
+       #ifdef __BUG_5361
+       if((*m_contigs)[m_contigId].size()==5361 && m_parameters->getRank()==111){
+           cout<<"[DEBUG] received response to RAY_MPI_TAG_GET_COVERAGE_AND_DIRECTION source: 111"< response;
        m_virtualCommunicator->getMessageResponseElements(m_workerId,&response);
        m_pairedReverseMarkerCoverage=response[0];
@@ -1172,6 +1297,13 @@ Case 13. (allowed)
    }else if(!m_reverseDirectionsReceived){
        return;
    }else if(!m_reverseDirectionLengthRequested){
+
+       #ifdef __BUG_5361
+       if((*m_contigs)[m_contigId].size()==5361 && m_parameters->getRank()==111){
+           cout<<"[DEBUG] reverse sending RAY_MPI_TAG_GET_PATH_LENGTH source: 111"<allocate(1*sizeof(Kmer));
        Rank rankId=getRankFromPathUniqueId(m_pairedReverseDirectionName);
        buffer[0]=m_pairedReverseDirectionName;
@@ -1182,6 +1314,13 @@ Case 13. (allowed)
        m_reverseDirectionLengthReceived=false;
    }else if(!m_reverseDirectionLengthReceived
    &&m_virtualCommunicator->isMessageProcessed(m_workerId)){
+
+       #ifdef __BUG_5361
+       if((*m_contigs)[m_contigId].size()==5361 && m_parameters->getRank()==111){
+           cout<<"[DEBUG] reverse received response RAY_MPI_TAG_GET_PATH_LENGTH source: 111"< response;
        m_virtualCommunicator->getMessageResponseElements(m_workerId,&response);
        m_pairedReverseDirectionLength=response[0];
@@ -1322,6 +1461,13 @@ Case 16. (allowed)
        return;

    }else if(m_reverseDirectionLengthReceived){
+
+       #ifdef __BUG_5361
+       if((*m_contigs)[m_contigId].size()==5361 && m_parameters->getRank()==111){
+           cout<<"[DEBUG] completed annotation "<            
sebhtml commented 11 years ago

This patch is insane.

The coverage values should just be sampled and something should be computed out of that. The patch is insane because it assume global truth, which is untrue since 2.0.0.

But the patch works if global truth is there.

sebhtml commented 11 years ago

68cccc1cc64ababfc225516a03940159e664e8c6

sebhtml commented 11 years ago

df95c6556594ae963b5b09fccb7cb8b1fdf98e18