madhuneal / ppss

Automatically exported from code.google.com/p/ppss
0 stars 0 forks source link

Improvement and suggestions for the documentation #15

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
In Manual1 (http://code.google.com/p/ppss/wiki/Manual1) :

(1) "Basic command line options"'s last paragraph

[quote]
In this example, a list of URLs is provided by the file list.txt. These
urls are fed to wget, which will retrieve the specified URLs. The -p option
specifies that 5 parallel downloads or threads should be started. Ofcourse,
this command can also be written like this:

$ ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q "$ITEM"' 
[/quote]

Note that the command is missing "-p 5"

(2) In "Advanced command line options"
[quote]
 -j <disable hyper threading>

If a CPU is found that supports hyper threading, the additional cores are
used. For example, an Intel Core 7i quad-core processor supports HT, thus
has effectively 8 cores. When HT is enabled, not 4 but 8 parallel jobs are
started.

Please note that this mechanism depends on what /proc/cpu (linux) reports.
For exampe, an old dual CPU P3 doesn't report the 'physical id' section,
thus if HT is disabled (why whould you do that anyway) only one processor
is used. So test this option if you need it. 
[/quote]

(2.1) is it possible to remove "<disable hyper threading>" from the
instruction line. This, when read with other options in the list, suggests
that users need to supply a parameter to "-j" option, which it does not need.

(2.2) -j in ppss-2.18.tgz says that issuing the command will "enable"
hyperthreading, not disable. However, when I run ppss.sh without this
option, hyperthreading _is_ enabled. Can you check this for consistency please?

=====

Can we add the following? (ppss-2.18.tgz used)

(3)ppss.sh must be run inside a file system that support file locking. It
can, however, the data to process can be in a non-locking file system.

(4)PPSS controller/intermediate output such as ppss.sh_is_running, JOB_LOG,
PPSS_* directories, ppss-array-pointer etc will be created inside the same
directory as ppss.sh, will be written to the current directory.

(4.1)This means one cannot share a copy of ppss.sh. Each ppss.sh run must
be use its own copy of ppss.sh file.

(4.2)[Very low priority TODO] Is it possible to modify the program to write
all these files to a user-specified directory instead? 

(4.3) I also noticed that I cannot run two copies of ppss.sh using the same
login ID on the same computer. What happens is the ppss.sh cannot
terminate. The processing is complete, but the script cannot return to
command line. On the display, I cannot never get the line "Terminated ..."

HTH and thanks for the script. It breathes life back to a set of year 2003
computers as it bumps up the overall throughput for the computers
significantly. Previously I have to manually setup and run two or more
separate processes which is inefficient and prone to trouble and it was
difficult to capture the output from individual job. With ppss, it's really
easy!

Original issue reported on code.google.com by cinly....@gmail.com on 7 Jul 2009 at 5:58

GoogleCodeExporter commented 9 years ago
Update on 

"(4.3) I also noticed that I cannot run two copies of ppss.sh using the same
login ID on the same computer. What happens is the ppss.sh cannot
terminate. The processing is complete, but the script cannot return to
command line. On the display, I cannot never get the line "Terminated ..."
"

I workout this happens because the following the while cannot terminate (near 
the
end, version 2.18) :

[quote]
# Either start new jobs or exit, sleep in the meantime
while true
do
   sleep 5
   JOBS=`ps ax | grep -v grep | grep -v -i screen | grep ppss.sh | wc -l` ##(A)
   ...
   if [ "$JOBS" -gt "$MIN_JOBS" ]
       ...
       sleep $INTERVAL
   fi
   ...
done
[/quote]

It has to do with the fact that line (A) search for ppss.sh jobs. If you run 
more
than a copy of ppss.sh job, it will report more jobs than "$MIN_JOBS" permits.

In fact, if you have any jobs whose invoking command has "ppss.sh" in it, such 
as "sh
controller_for_all_ppss.sh", ppss.sh cannot terminate for the same reason.

So far, the problem can be resolved by remembering not to run more than one 
copy of
ppss.sh, and to be careful in naming files.

Unfortunately, since line (A) does "ps ax" and the "ps" will catch all 
instances of
"ppss.sh" for all users, if you have another user running "ppss.sh", both yours 
and
the other users' ppss.sh cannot terminate, since copies of line (A) will detect 
the
other script and loop forever.

This will be more difficult to resolve, as it involves the cooperation of 
another
user, which might not be forthcoming, or quite simply, one party forgotten to 
check.

Best regards.

Original comment by cinly....@gmail.com on 20 Jul 2009 at 6:35

GoogleCodeExporter commented 9 years ago
I finally picked up the courage to modify ppss.sh script.

The modification I did are as follow:

  - ppss.sh will now distinguish between ppss.sh instance run by different users.
Previously, if there are two users running ppss.sh, both ppss.sh will not 
terminate,
as the "JOBS=`ps ...|wc -l` the last while will pick up processes under both 
instance
of ppss.sh and think that it had not completed. Now, it will only pick up 
ppss.sh for
the user, more precisely $USER.

  - ppss.sh will not run a second instance of ppss.sh under the same user id.
Otherwise, the same "JOBS=..." will think that it had not completed. There is 
no way
I can think about that will avoid this. So I think the best way forward is to 
stop
the second instance.

  - created an interactive setting, invoked using --interactive or -i. In this
setting, if the script detect there is another user of ppss.sh, it will alert 
the
user and ask the user whether he wants to continue. I created this because my 
users
will want to consider running ppss.sh on another computer if this computer is 
already
running a ppss.sh job. If it is not in interactive setting, then ppss.sh will 
default
to continue running the ppss.sh on the computer.

The major modification is to change 

   JOB=`ps ax | grep ...`

to

  JOB=`ps axu | grep ... | grep $USER | ...`

Some code are added just before the script call main to detect instances of 
other
ppss.sh script running.

Unfortunately, as I am not setup for distributed PPSS. I only tested using 
standalone
PPSS. Attached are also  two scripts for testing: ppss-test.sh and 
ppss-friends-tests.sh

ppss-test.sh : Check that ppss.sh is working correctly and then check that a 
second
instance of PPSS under the same user name will be blocked. Run two tests. First 
test
run only one instance of ppss.sh  and the second two instances of ppss.sh under 
the
same userid. Therefore, the first test should pass, while in the second test, 
only
the first instance will run.

ppss-friends-tests.sh : The purpose is to test how ppss.sh behave when there 
are two
different users using it on the same computer. To use it, you need to run the 
same
shell script as two separate users. First time you must run as "Friend", the 
second
time "Tester". The tester has two choices: Running under interactive setting 
and non
interactive setting. In interactive setting, the friend's ppss instance should 
be
detected and ppss.sh should ask whether the tester wants to run his ppss 
instance. In
non-interactive the tester's ppss will be run without any intervention.

I normally called anything I modified without getting prior agreement from the
original author "bastardized", hence, the offensive version name I used.
Thank you to my colleague Dr Roger Tait for allowing me to use his account to 
test
ppss.sh's handling of multiple PPSS instance from different user.

I am submitting this in the hope that part of it might be useful. I tested the 
script
on iMac Panther and an old Fedora distribution.

HTH

Original comment by cinly....@gmail.com on 21 Jul 2009 at 4:16

Attachments:

GoogleCodeExporter commented 9 years ago
HI, 

I just saw your stuff, way thanks for the great response. I'm currently fixing 
the documentation and will look into 
your changes.

Original comment by Louwrentius on 5 Aug 2009 at 8:38

GoogleCodeExporter commented 9 years ago
Without looking into the code, I can state that based on what you've written 
this is an excellent improvement. 

I will also look into an option that allows the user to specify the working 
directory, thus preventing the clutter 
of files in the current directory. 

Also, the -i interactive option should be default. By default it should warn 
before running a 2nd instance with 
the same user account. An additional switch should suppress this message if the 
user persists.

Original comment by Louwrentius on 5 Aug 2009 at 8:53

GoogleCodeExporter commented 9 years ago
Dear Louwrentius,

Do you want me to make the changes to the -i switch? If so, do you have any 
other
particular switch name that you prefer? I am thinking of changing the switch -N 
for
non-interactive.

Also, while looking at your "usage" message, I see that both --command and 
--config
have the shortform -c. The actual script requires --config to be -C.

Knowing that you are busy, I will do both changes if I haven't get any reply in 
24
hours (Today is 6th August)

Original comment by cinly....@gmail.com on 6 Aug 2009 at 10:02

GoogleCodeExporter commented 9 years ago
Dear Cinly, 

I am currently reviewing your work and incorporating it into ppss. I see that 
you originaly implemeted a check 
to warn users if other users where running PPSS, but since PPSS now checks the 
user for each process, this 
should not be necessary anymore. 

Thanks for the -C 'issue' btw, I corrected this. 

As a side effect I fixed some bugs and improved error reporting.

Your code seems to run fine. I just made some small changes and put it into a 
function.

bash-3.2$ vi ~/branches/distributed-ppss/ppss.sh
bash-3.2$ ./ppss.sh standalone -d ~/branches/distributed-ppss/tmp/eee -c 'sleep 
10'
aug 09 18:13:25: ERROR Cannot run PPSS because there is another running 
instances of PPSS detected. See 
log for more details.
bash-3.2$ 

Original comment by Louwrentius on 9 Aug 2009 at 4:15

GoogleCodeExporter commented 9 years ago
I've been thinking about the 'interactive' option. 

If you run two instances of PPSS under the same user account, both instances 
will keep on running. They will 
never exit because they are waiting on each other to finish. 

The only way to fix this is, is to create a mechanism that records all PIDs 
that are created by a single instance. 
The only way to achieve this is by creating another file as a means of 
inter-proces-communication that holds 
all the PIDs of running processes. I'm not 100% sure if it is possible to 
accomplish this. It would be a more 
elegant solution than the current while-loop at the end.

For now, I think I want to add the option -F (force) to run PPSS even if 
another instance is already running.

Original comment by Louwrentius on 9 Aug 2009 at 4:29

GoogleCodeExporter commented 9 years ago
I released PPSS version 2.20. 

A language typo is already fixed in svn. 

The other major thing that is left is an option to specify a working directory 
to keep things clean. I will work on 
that but I cannot say when that will be finished.

Original comment by Louwrentius on 9 Aug 2009 at 5:19

GoogleCodeExporter commented 9 years ago
PPSS is now able to start multiple times as the same user.

Original comment by Louwrentius on 21 Oct 2009 at 9:33