Closed zorino closed 11 years ago
Thank you for the detailed report and information to reproduce the problem.
Another example of a similar problem. Ray stops after scaffolding. Happened for all samples.
/rap/nne-790-ab/projects/Project_CQDM2/CQDM_Run2-Ray2.1.0
all directories that end with RayMeta20130315
They were run using the corresponding .conf files.
This may be related to Infiniband problems:
mpiexec: abort is already in progress...hit ctrl-c again to forcibly terminate
mpiexec: killing job...
[r107-n71:00645] [[33145,0],0]-[[33145,0],5] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
--------------------------------------------------------------------------
mpiexec was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
r106-n83
r106-n91
r106-n92
r108-n10
r108-n17
r105-n4
r106-n27
Next step is to reproduce the problem.
/rap/nne-790-ab/projects/Ray-issues/165
@fredericraymond I can't read your files. However, it seems that your error files are not empty.
$ ls -l /rap/nne-790-ab/projects/Project_CQDM2/CQDM_Run2-Ray2.1.0|grep stderr|grep 2013 -rw------- 1 fraymond nne-790-01 162 Mar 16 21:19 Sample_P1J0.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 16 21:19 Sample_P1J7.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 1363 Mar 16 21:26 Sample_P2J0.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 1363 Mar 16 21:31 Sample_P2J7.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 16 21:34 Sample_P3J0.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 16 21:36 Sample_P3J7.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 16 21:59 Sample_P4J0.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 16 22:16 Sample_P4J7.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 16 22:18 Sample_P5J0.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 16 22:19 Sample_P5J7.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 16 22:59 Sample_P6J0.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 16 23:43 Sample_P6J7.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 17 00:39 Sample_P7J0.conf-Ray-2013-03-15.stderr -rw------- 1 fraymond nne-790-01 162 Mar 17 01:01 Sample_P7J7.conf-Ray-2013-03-15.stderr
Job running in /rap/nne-790-ab/projects/Ray-issues/165 on colosse
Ray fbc3a9c859c72340df3d061de5f8e553326597c9 RayPlatform 7eece8a3cb2eb4e132f76f854f00d3f9640f12da
10252833 sboisver Running 64 11:52:55 Wed Mar 27 14:32:56
Memory usage is ~ 250 MiB per MPI rank before coloring.
Ray ran out of memory because there are too many things to color:
# Search Datasets
-search
/rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/EMBL-Genomes/Phage
-search
/rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/Gene-Ontology/EMBL_CDS_Sequences
-search
/rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Bacteria
-search
/rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Bacteria_DRAFT
-search
/rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Viruses
-search
/rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Plasmids
-search
/rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-RefSeq-Viral
-search
/rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Human
-search
/rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/ARDB
-search
/rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/Amibe
-search
/rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/4AR_13-02-21
-search
/rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/vfdb_13-01-24
-search
/rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/IS-finder
# Profiling Ontology + Taxonomy
-gene-ontology
/rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/Gene-Ontology/OntologyTerms.txt
/rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/Gene-Ontology/Annotations.txt
-with-taxonomy
/rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/Genome-to-Taxon.tsv
/rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/TreeOfLife-Edges.tsv
/rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/Taxon-Names.tsv
Counts:
1328 /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/EMBL-Genomes/Phage
128 /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/Gene-Ontology/EMBL_CDS_Sequences
2239 /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Bacteria
4717 /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Bacteria_DRAFT
1392 /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Viruses
3947 /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Plasmids
4537 /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-RefSeq-Viral
27 /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Human
379 /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/ARDB
1 /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/Amibe
1489 /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/4AR_13-02-21
1946 /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/vfdb_13-01-24
2407 /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/IS-finder
The problem is caused by the fact that there are soooooooo many E. coli genomes sequenced.
$ ls /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Bacteria|grep Escherichia_coli | wc -l 54 $ ls /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Bacteria_DRAFT/|grep Escherichia_coli | wc -l 608
Therefore, Things provided to Ray -search should be not redundant. Otherwise, you need more memory.
Salut Maxime,
=> https://github.com/sebhtml/ray/issues/165
Your job for Sample_TC-909-2 failed because it's a E. coli strain and you are coloring with at least 662 E. coli strains. Therefore, there will be a ridiculously high number of physical colors.
$ ls /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Bacteria|grep Escherichia_coli | wc -l 54 $ ls /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Bacteria_DRAFT/|grep Escherichia_coli | wc -l 608
This hypothesis is also supported by these facts:
It's probably the same thing for Frédéric's jobs.
Courses of actions:
or
or
Will close as soon as possible (when job with more nodes finishes).
Ok thank you very much.
As always you're a champ !
The taxonomy wasn't really important in my case, just to check possible contamination. Also I'm not sure Bacteria_DRAFT adds a lot more informations for the profiling. Maybe it's good in metagenomic study but certainly useless for single genome.
Commenting Bacterial_DRAFT fixes the problem.
Configuration:
-k 31
-p /home/deraspem/PaulHRoy/MiSeq-14Souches-Reads/Sample_TC-909/TC-909_Lane1_R1_1.fastq /home/deraspem/PaulHRoy/MiSeq-14Souches-Reads/Sample_TC-909/TC-909_Lane1_R2_1.fastq
-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/EMBL-Genomes/Phage
-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/Gene-Ontology/EMBL_CDS_Sequences
-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Bacteria
#/rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Bacteria_DRAFT
-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Viruses
-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Plasmids
-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-RefSeq-Viral
-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/NCBI-Genomes-Human
-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/ARDB
-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/Amibe
-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/4AR_13-02-21
-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/vfdb_13-01-24
-search /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/IS-finder
-gene-ontology /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/Gene-Ontology/OntologyTerms.txt /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/Gene-Ontology/Annotations.txt -with-taxonomy /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/Genome-to-Taxon.tsv /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/TreeOfLife-Edges.tsv /rap/nne-790-ab/genomes/RayKmerSearchStuff-dev/2013-01-30/NCBI-taxonomy/Taxon-Names.tsv
-write-kmers
-o Sample_TC-909-9
All the files to reproduce the bug (submission script, configuration file, reads, assembly output, and Ray src code) are located here (via sym links) :
/rap/nne-790-ab/projects/deraspem-raytest/RayHangs/
As you said to me it might be a race condition.
It's the only sample on 15 that did that but it did it every try (4) and with different kmer sizes (21,31,51).