sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

Ray hangs while coloring graph #165

Closed zorino closed 11 years ago

zorino commented 11 years ago

All the files to reproduce the bug (submission script, configuration file, reads, assembly output, and Ray src code) are located here (via sym links) :

/rap/nne-790-ab/projects/deraspem-raytest/RayHangs/

As you said to me it might be a race condition.

It's the only sample on 15 that did that but it did it every try (4) and with different kmer sizes (21,31,51).

sebhtml commented 11 years ago

Thank you for the detailed report and information to reproduce the problem.

fredericraymond commented 11 years ago

Another example of a similar problem. Ray stops after scaffolding. Happened for all samples.

/rap/nne-790-ab/projects/Project_CQDM2/CQDM_Run2-Ray2.1.0

all directories that end with RayMeta20130315

They were run using the corresponding .conf files.

sebhtml commented 11 years ago

This may be related to Infiniband problems:

mpiexec: abort is already in progress...hit ctrl-c again to forcibly terminate

mpiexec: killing job...

[r107-n71:00645] [[33145,0],0]-[[33145,0],5] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
--------------------------------------------------------------------------
mpiexec was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
        r106-n83
        r106-n91
        r106-n92
        r108-n10
        r108-n17
        r105-n4
        r106-n27

Next step is to reproduce the problem.

/rap/nne-790-ab/projects/Ray-issues/165

sebhtml commented 11 years ago

@fredericraymond I can't read your files. However, it seems that your error files are not empty.

$ ls -l /rap/nne-790-ab/projects/Project_CQDM2/CQDM_Run2-Ray2.1.0|grep stderr|grep 2013 -rw------- 1 fraymond nne-790-01 162 Mar 16 21:19 Sample_P1J0.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 16 21:19 Sample_P1J7.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 1363 Mar 16 21:26 Sample_P2J0.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 1363 Mar 16 21:31 Sample_P2J7.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 16 21:34 Sample_P3J0.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 16 21:36 Sample_P3J7.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 16 21:59 Sample_P4J0.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 16 22:16 Sample_P4J7.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 16 22:18 Sample_P5J0.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 16 22:19 Sample_P5J7.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 16 22:59 Sample_P6J0.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 16 23:43 Sample_P6J7.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 17 00:39 Sample_P7J0.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 17 01:01 Sample_P7J7.conf-Ray-2013-03-15.stderr

sebhtml commented 11 years ago

Job running in /rap/nne-790-ab/projects/Ray-issues/165 on colosse

Ray fbc3a9c859c72340df3d061de5f8e553326597c9 RayPlatform 7eece8a3cb2eb4e132f76f854f00d3f9640f12da

10252833           sboisver    Running    64    11:52:55  Wed Mar 27 14:32:56
sebhtml commented 11 years ago

Memory usage is ~ 250 MiB per MPI rank before coloring.

Ray ran out of memory because there are too many things to color:

# Search Datasets
-search
        /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/EMBL-Genomes/Phage
-search
        /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/Gene-Ontology/EMBL_CDS_Sequences
-search
        /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Bacteria
-search
        /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Bacteria_DRAFT
-search
        /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Viruses
-search
        /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Plasmids
-search
        /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-RefSeq-Viral
-search
        /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Human
-search
        /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/ARDB
-search
        /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/Amibe
-search
        /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/4AR_13-02-21
-search
        /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/vfdb_13-01-24
-search
        /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/IS-finder

# Profiling Ontology + Taxonomy
-gene-ontology
        /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/Gene-Ontology/OntologyTerms.txt
        /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/Gene-Ontology/Annotations.txt
-with-taxonomy
        /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/Genome-to-Taxon.tsv
        /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/TreeOfLife-Edges.tsv
        /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/Taxon-Names.tsv

Counts:

  1328   /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/EMBL-Genomes/Phage
  128   /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/Gene-Ontology/EMBL_CDS_Sequences
  2239   /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Bacteria
  4717   /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Bacteria_DRAFT
  1392   /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Viruses
  3947   /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Plasmids
  4537   /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-RefSeq-Viral
  27   /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Human
  379   /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/ARDB
  1   /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/Amibe
  1489   /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/4AR_13-02-21
  1946   /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/vfdb_13-01-24
  2407   /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/IS-finder
sebhtml commented 11 years ago

The problem is caused by the fact that there are soooooooo many E. coli genomes sequenced.

$ ls /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Bacteria|grep Escherichia_coli | wc -l 54 $ ls /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Bacteria_DRAFT/|grep Escherichia_coli | wc -l 608

Therefore, Things provided to Ray -search should be not redundant. Otherwise, you need more memory.

sebhtml commented 11 years ago

Salut Maxime,

=> https://github.com/sebhtml/ray/issues/165

Your job for Sample_TC-909-2 failed because it's a E. coli strain and you are coloring with at least 662 E. coli strains. Therefore, there will be a ridiculously high number of physical colors.

$ ls /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Bacteria|grep Escherichia_coli | wc -l 54 $ ls /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Bacteria_DRAFT/|grep Escherichia_coli | wc -l 608

This hypothesis is also supported by these facts:

  1. Presumably this is your only sample that is a E. coli sample.
  2. The failure is reproducible;
  3. The following error message is caused by memory exhaustion mca_oob_tcp_msg_recv: readv failed: Connection reset by peer"

It's probably the same thing for Frédéric's jobs.

Courses of actions:

  1. Increase the number of nodes

or

  1. Fix the datasets you used for searching by reducing their redundancy

or

  1. Patch the code in the plugin "Searcher".
sebhtml commented 11 years ago

Will close as soon as possible (when job with more nodes finishes).

zorino commented 11 years ago

Ok thank you very much.

As always you're a champ !

The taxonomy wasn't really important in my case, just to check possible contamination. Also I'm not sure Bacteria_DRAFT adds a lot more informations for the profiling. Maybe it's good in metagenomic study but certainly useless for single genome.

sebhtml commented 11 years ago

Commenting Bacterial_DRAFT fixes the problem.

Configuration:

Ray Configuration File

Kmer Size

-k 31

Reads Librairies

-p /home/deraspem/PaulHRoy/MiSeq-14Souches-Reads/Sample_TC-909/TC-909_Lane1_R1_1.fastq /home/deraspem/PaulHRoy/MiSeq-14Souches-Reads/Sample_TC-909/TC-909_Lane1_R2_1.fastq

-one-color-per-file

activated @6

-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/EMBL-Genomes/Phage

activated @8

-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/Gene-Ontology/EMBL_CDS_Sequences

activated @9

-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Bacteria

-search

    #/rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Bacteria_DRAFT

activated @7

-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Viruses

activated @6

-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Plasmids

activated @6

-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-RefSeq-Viral

activated @6

-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Human

activated @7

-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/ARDB

activated @7

-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/Amibe

activated @7

-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/4AR_13-02-21

activated @7

-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/vfdb_13-01-24

activated @7

-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/IS-finder

Profiling Ontology + Taxonomy

-gene-ontology /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/Gene-Ontology/OntologyTerms.txt /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/Gene-Ontology/Annotations.txt -with-taxonomy /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/Genome-to-Taxon.tsv /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/TreeOfLife-Edges.tsv /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/Taxon-Names.tsv

Write Graph to file

-write-kmers

Output Directory

-o Sample_TC-909-9