madhuneal / ppss

Automatically exported from code.google.com/p/ppss
0 stars 0 forks source link

If multiple tasks are the same it runs only once #18

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Create a joblist file, by repeating a command many times.

[psarkar@teentaal ppssTrial]$ head -5 joblist.txt 
/tmp/ppssTrial/largeLoop.sh
/tmp/ppssTrial/largeLoop.sh
/tmp/ppssTrial/largeLoop.sh
/tmp/ppssTrial/largeLoop.sh
/tmp/ppssTrial/largeLoop.sh
[psarkar@teentaal ppssTrial]$ wc -l joblist.txt 
170 joblist.txt
[psarkar@teentaal ppssTrial]$

2. Here is what my largeLoop.sh looks like. The thing
to observe is that every time it runs, it should produce
a different output.

[psarkar@teentaal ppssTrial]$ cat largeLoop.sh 
#!/bin/bash

N=0
PN=$BASHPID
while (( $N < 1000 )); do
   N=$(( $N + 1 ));
   echo ${PN}_${N};
   sleep 1;
done

[psarkar@teentaal ppssTrial]$ 

What is the expected output? What do you see instead?

I would have expected that all processes would run (i.e., the program would
run once for each time it is listed in the joblist file). Instead
it runs only once.

[psarkar@teentaal ppssTrial]$ \rm -r ppss/
[psarkar@teentaal ppssTrial]$ ppss.sh -f joblist.txt -c 'sh '
Dec 09 09:55:14:  =========================================================
Dec 09 09:55:14:                         |P|P|S|S|                         
Dec 09 09:55:14:  Distributed Parallel Processing Shell Script version 2.41
Dec 09 09:55:14:  =========================================================
Dec 09 09:55:14:  Hostname:     teentaal
Dec 09 09:55:14:  ---------------------------------------------------------
Dec 09 09:55:14:  CPU: Intel(R) Xeon(TM) CPU 3.20GHz
Dec 09 09:55:14:  Found 8 logic processors.
Dec 09 09:55:14:  Starting 8 parallel workers.
Dec 09 09:55:14:  ---------------------------------------------------------
Dec 09 09:55:21:  Currently 100 percent complete. Processed 170 of 170 items.
Dec 09 09:55:24:  1 job is remaining.       
^C[psarkar@teentaal ppssTrial]$ Dec 09 09:55:34:  Finished. Consult
./ppss/job_log for job output.

[psarkar@teentaal ppssTrial]$ more ppss/job_log/tmpppssTriallargeLoop.sh 
===== PPSS Item Log File =====
Host:       teentaal
Process:14109
Item:       /tmp/ppssTrial/largeLoop.sh
Start date: Dec 09 09:55:15

14225_1
14225_2
14225_3
14225_4
14225_5
14225_6
14225_7
14225_8
14225_9
14225_10
14225_11
14225_12
14225_13
14225_14
14225_15
14225_16
14225_17
14225_18
14225_19
14225_20
14225_21
14225_22
14225_23
14225_24
14225_25
14225_26
14225_27
14225_28
14225_29

What version of the product are you using? On what operating system?

|P|P|S|S| Distributed Parallel Processing Shell Script 2.41

Please provide any additional information below.

There are a number of different programs that will produce different output
each time it is run. E.g., optimizers with random initialization.
It would be good if these can be handled.

Perhaps the process ID or the command sequence number can serve as
a hash/descriptor -- rather than the altered command line. This may also
alleviate some of the other issues I see (e.g. regarding ':' or special
chars in commands.)

Original issue reported on code.google.com by sarkarpr...@gmail.com on 9 Dec 2009 at 6:09

GoogleCodeExporter commented 9 years ago
Thank you for your feedback. 

I understand the issue. However, this is expected behavior. If you do not want 
this behavior, I think PPSS is not 
the answer to your problem in the first place, or you should make the items 
unique.

PPSS is designed with the assumption that items are unique. PPSS detects if an 
item already has been processed 
based on the existence of a file name and will not process it again. So that is 
why identical items will only be 
processed once.

Original comment by Louwrentius on 13 Dec 2009 at 8:44

GoogleCodeExporter commented 9 years ago
This issue is expected behaviour.

Original comment by Louwrentius on 16 Dec 2009 at 2:11