madhuneal / ppss

Automatically exported from code.google.com/p/ppss
0 stars 0 forks source link

Running more than 10000 processes in a run, gives a segmentatation fault. #64

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Running samtools (shouldn't matter which program) 500 000 times

What is the expected output? What do you see instead?

The script crashes after launching 99999 processes.

/usr/bin/ppss -l sample1-ppss.log -p 25 -f ../sample1.pos -c 'samtools mpileup 
-r "$ITEM" -f /media/disk/Homo_sapiens_assembly19_sorted.fasta 
/media/disk/sample1.calibrated.bam'
Jan 16 17:34:59:
Jan 16 17:34:59:  =========================================================
Jan 16 17:34:59:                         |P|P|S|S|
Jan 16 17:34:59:  Distributed Parallel Processing Shell Script vers. 2.85
Jan 16 17:34:59:  =========================================================
Jan 16 17:34:59:  Hostname:             my-server
Jan 16 17:34:59:  ---------------------------------------------------------
Jan 16 17:34:59:  CPU: Intel(R) Xeon(R) CPU           X7560  @ 2.27GHz
Jan 16 17:34:59:  Starting 25 parallel workers.
Jan 16 17:34:59:  ---------------------------------------------------------
Jan 17 00:36:14:  Currently 18 percent complete. Processed 99999 of 
541958./usr/bin/ppss: line 2903:  8930 Segmentation fault      listen_for_job 
"$MAX_NO_OF_RUNNING_JOBS"

/usr/bin/ppss -l sample2-ppss.log -p 25 -f ../sample2.pos -c 'samtools mpileup 
-r "$ITEM" -f /media/disk/Homo_sapiens_assembly19_sorted.fasta 
/media/disk/sample2.calibrated.bam'
Jan 16 17:46:51:
Jan 16 17:46:51:  =========================================================
Jan 16 17:46:51:                         |P|P|S|S|
Jan 16 17:46:51:  Distributed Parallel Processing Shell Script vers. 2.85
Jan 16 17:46:51:  =========================================================
Jan 16 17:46:51:  Hostname:             my-server
Jan 16 17:46:51:  ---------------------------------------------------------
Jan 16 17:46:51:  CPU: Intel(R) Xeon(R) CPU           X7560  @ 2.27GHz
Jan 16 17:46:52:  Starting 25 parallel workers.
Jan 16 17:46:52:  ---------------------------------------------------------
Jan 17 00:46:15:  Currently 18 percent complete. Processed 99999 of 
541958./usr/bin/ppss: line 2903:  4882 Segmentation fault      listen_for_job 
"$MAX_NO_OF_RUNNING_JOBS"

What version of the product are you using? On what operating system?
I experienced  the bug with v2.85. Ubuntu 11.04 64-bit (server edition).

Please provide any additional information below.

I am trying the last version of the script now, but I guess it will crash too.

Original issue reported on code.google.com by hulselma...@gmail.com on 17 Jan 2012 at 9:07

GoogleCodeExporter commented 9 years ago
I can't reproduce it on my iMac and Debian Lenny is also still going strong.

Jan 29 23:28:17:  
Jan 29 23:28:17:  =========================================================
Jan 29 23:28:17:                         |P|P|S|S|                         
Jan 29 23:28:17:  Distributed Parallel Processing Shell Script vers. 2.85
Jan 29 23:28:17:  =========================================================
Jan 29 23:28:17:  Hostname:     Louwrentius.local
Jan 29 23:28:17:  ---------------------------------------------------------
Jan 29 23:28:17:  CPU:  Intel Core i7  3,4 GHz
Jan 29 23:28:17:  Starting 50 parallel workers.
Jan 29 23:28:17:  ---------------------------------------------------------
Jan 29 23:46:42:  Currently 1 percent complete. Processed 10404 of 1000002.
Jan 29 23:46:41:  ETA: Sun Jan 29 23:28:17 CET 2012

Seems like an issue with Ubuntu & Bash?
Seems like a bug I can't do anything about this. Can you test on another Linux 
flavour?

Original comment by Louwrentius on 29 Jan 2012 at 10:48

GoogleCodeExporter commented 9 years ago
I saw this on Linux:

Jan 30 00:41:37:  2% complete. Processed 24841 of 1000002. Failed 0/1000002.
Jan 30 00:41:37:  ETA: Sun Jan 29 23:30:55 CET 2012nterrupted system call

What this means is unknown to me.

Original comment by Louwrentius on 29 Jan 2012 at 11:42

GoogleCodeExporter commented 9 years ago
I can't edit the title, but it should say 100 000 instead of 10 000.

I think you missed this line:
Processed 99999 of 541958

Original comment by hulselma...@gmail.com on 30 Jan 2012 at 11:23

GoogleCodeExporter commented 9 years ago
Doing a second run!

Original comment by Louwrentius on 1 Feb 2012 at 9:20

GoogleCodeExporter commented 9 years ago
No problems on the iMac:

Louwrentius:tmp nan03$ ./ppss -f miljoen -c 'echo ' 
Feb 01 22:19:49:  
Feb 01 22:19:49:  =========================================================
Feb 01 22:19:49:                         |P|P|S|S|                         
Feb 01 22:19:49:  Distributed Parallel Processing Shell Script vers. 2.85
Feb 01 22:19:49:  =========================================================
Feb 01 22:19:49:  Hostname:     Louwrentius.local
Feb 01 22:19:49:  ---------------------------------------------------------
Feb 01 22:19:49:  CPU:  Intel Core i7  3,4 GHz
Feb 01 22:19:49:  Found 8 logic processors.
Feb 01 22:19:49:  Starting 8 parallel workers.
Feb 01 22:19:49:  ---------------------------------------------------------
Feb 01 22:46:27:  Currently 1 percent complete. Processed 14595 of 1000002.
Feb 02 08:56:41:  Currently 24 percent complete. Processed 241700 of 1000002.
Feb 02 08:56:41:  ETA: Wed Feb  1 22:19:49 CET 2012

Original comment by Louwrentius on 2 Feb 2012 at 7:57

GoogleCodeExporter commented 9 years ago
Trying with -c 'echo ' like you did, works for me (goes past 100 000). But echo 
is a bash build in.

When trying with the external echo command, it still stops at 99 999:
$ /home/ghuls/ppss -p 100 -f ./sample2.pos -c '/bin/echo '
Feb 04 22:52:20:
Feb 04 22:52:20:  =========================================================
Feb 04 22:52:20:                         |P|P|S|S|
Feb 04 22:52:20:  Distributed Parallel Processing Shell Script vers. 2.97
Feb 04 22:52:20:  =========================================================
Feb 04 22:52:20:  Hostname:             seq-srv-01
Feb 04 22:52:20:  ---------------------------------------------------------
Feb 04 22:52:20:  CPU: Intel(R) Xeon(R) CPU           X7560  @ 2.27GHz
Feb 04 22:52:21:  Starting 100 parallel workers.
Feb 04 22:52:21:  ---------------------------------------------------------
Feb 06 07:54:48:  18% complete. Processed 99999 of 541958. Failed 0/541958.
Feb 06 07:54:39:  ETA: Sat Feb 11 05:24:59 CET 2012

Original comment by hulselma...@gmail.com on 6 Feb 2012 at 8:34

GoogleCodeExporter commented 9 years ago
Can you please provide an example which provokes the crash and that I will be 
able to reproduce?

Original comment by Louwrentius on 9 Feb 2012 at 7:50

GoogleCodeExporter commented 9 years ago
Did you try your line:
./ppss -f miljoen -c 'echo ' 

with an external echo binary?
./ppss -f miljoen -c '/bin/echo ' 

This stopped at 99 999 for me:
./ppss -p 100 -f ./sample2.pos -c '/bin/echo '

This worked fine for me (build-in echo):
./ppss -p 100 -f ./sample2.pos -c 'echo '

$ bash --version
GNU bash, version 4.2.8(1)-release (x86_64-pc-linux-gnu)

$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=11.04
DISTRIB_CODENAME=natty
DISTRIB_DESCRIPTION="Ubuntu 11.04"

Original comment by hulselma...@gmail.com on 10 Feb 2012 at 7:57

GoogleCodeExporter commented 9 years ago
This is on mac os x Lion:

{{{
Louwrentius:tmp nan03$ ./ppss -f miljoen -c '/bin/echo ' 
Feb 16 00:06:35:  
Feb 16 00:06:35:  =========================================================
Feb 16 00:06:35:                         |P|P|S|S|                         
Feb 16 00:06:35:  Distributed Parallel Processing Shell Script vers. 2.85
Feb 16 00:06:35:  =========================================================
Feb 16 00:06:35:  Hostname:     Louwrentius.local
Feb 16 00:06:35:  ---------------------------------------------------------
Feb 16 00:06:35:  CPU:  Intel Core i7  3,4 GHz
Feb 16 00:06:35:  Found 8 logic processors.
Feb 16 00:06:35:  Starting 8 parallel workers.
Feb 16 00:06:35:  ---------------------------------------------------------
Feb 16 09:07:51:  Currently 21 percent complete. Processed 216225 of 1000002.
Feb 16 09:07:51:  ETA: Thu Feb 16 00:06:35 CET 2012

}}}

Original comment by Louwrentius on 16 Feb 2012 at 8:08

GoogleCodeExporter commented 9 years ago
I will test it on debian, do'nt have ubuntu around. Guess it's a bug in bash.

Original comment by Louwrentius on 16 Feb 2012 at 8:09

GoogleCodeExporter commented 9 years ago
Possibly.

Original comment by hulselma...@gmail.com on 17 Feb 2012 at 1:09

GoogleCodeExporter commented 9 years ago
Ok, I can reproduce this issue on Debian Squeeze. 

Mac: 

bash-3.2$ bash --version
GNU bash, version 3.2.48(1)-release (x86_64-apple-darwin11)
Copyright (C) 2007 Free Software Foundation, Inc.

Debian 6.0.4:

dpkg -l | grep -i bash
ii  bash                                4.1-3                        The GNU 
Bourne Again SHell

Unfortunately, you may need to try and replace / update bash on your system or 
find another program (parallel?) 
that does the job for you.

Original comment by Louwrentius on 17 Feb 2012 at 8:10

GoogleCodeExporter commented 9 years ago
I already solved my problem with parallel 2 weeks ago. But thanks for 
mentioning it.

Original comment by hulselma...@gmail.com on 18 Feb 2012 at 4:37

GoogleCodeExporter commented 9 years ago

Original comment by Louwrentius on 21 Feb 2012 at 9:36