launcher VERY slow when run from remote disk

jcohenadad commented 8 years ago

Implementing a solution in branch poq_issue_838_slow_exe

see video here: https://drive.google.com/open?id=0Bx3A13n3Q_EAdVJMR0laRUZtZkE

poquirion commented 8 years ago

How is the disk mounted, and have you tried to run other executable from the mounted disk?

Then, it is good practice to install executable on local disk. However I am surprised by how slow it is.

jcohenadad commented 8 years ago

the disk was mounted with default Apple smb3 protocol, via VPN. I did not try other executable. I am at Poly now, and the disk is mounted within the same VLAN, using the same protocol (smb3). It is MUCH faster, however, slightly slower than the local call.

poquirion commented 8 years ago

I don't know samba and smb3 very well but I am not surprised. If the disk was mounted over a home internet connection network, then to the risq, then the neuropoly vpn... The disk latency must be pretty important.

From the Error you get at ctrl-C, it is the python imports that are slow. There is a lot of file io in that phase, and since all the python libraries are on the mounted system... I am not surprise that it is that slow and the only obvious way to get around that problem is to make sure that the SCT python/ directory is on a local disk, or at least on one with low latency.

To check the latency on my linux machine I just do iostat -x and look a the "await" output ,but I am not sure how to do that on a mac (http://apple.stackexchange.com/questions/52751/mac-os-x-disk-performance-monitoring).

jcohenadad commented 8 years ago

ok so if there is no easy fix i guess we can remove the priority:high, and ask for users to install SCT on a local disk

jcohenadad commented 8 years ago

might be causing issue #967

jcohenadad commented 7 years ago

i increased the priority of this issue because with the new server we are now running processes on it, and SCT functions are slower to run. Also, this issue affects SCT users in other research labs who are using network home folders.

poquirion commented 7 years ago

Make test with slow io on disk

https://unix.stackexchange.com/questions/48138/how-to-throttle-per-process-i-o-to-a-max-limit

and https://fritshoogland.wordpress.com/2012/12/15/throttling-io-with-linux/

But it might not be just disk speed, it might also be "ping time"

peristeri commented 7 years ago

Here's a crude profiling of sct_propseg using time and remotely logged in to duke at the path /Volumes/sct_testing/small

time ls

real    0m0.038s
user    0m0.002s
sys     0m0.004s

time sct_propseg # no changes

real    0m15.497s
user    0m1.499s
sys     0m0.430s

time sct_propseg #  without the sct version printout

real    0m13.985s
user    0m1.499s
sys     0m0.400s

time sct_propseg # import all modules and functions and that's it.

real    0m14.370s
user    0m1.512s
sys     0m0.399s

time sct_propseg # no imports, only load the functions

real    0m3.432s
user    0m0.883s
sys     0m0.211s

time sct_propseg # import msct_parser only and functions

real    0m8.843s
user    0m1.111s
sys     0m0.307s

time sct_propseg # from spinalcordtoolbox.centerline import optic only and functions

real    0m13.443s
user    0m1.485s
sys     0m0.379s

time sct_propseg # import spinalcordtoolbox.centerline only and functions

real    0m3.248s
user    0m0.898s
sys     0m0.215s

The preliminary tests shows the latency is caused by:

the method of importing modules in the scripts. import foo versus from foo import bar
using git rev-parse is slow in general.
import msct_parser is slow to load.

Also to note that these number depend on the connection being open. In most OS, if the connection is not used for a long period of time, it is closed until the next request. example: the first call to ls -al is slower then the second, if the second call is done soon after.

bash-3.2$ time ls -al
...
real    0m6.543s
user    0m0.002s
sys     0m0.005s

bash-3.2$ time ls -al
total 656
...
real    0m0.812s
user    0m0.002s
sys     0m0.004s

Immediate suggestions

Use your home directory as the working directory, and used the full remote path to the files.

Short term suggestion

Profile the scripts modules and identify IO bottlenecks.

Long term suggestion

Build file system module to manage all of SCT's IO operations.

jcohenadad commented 7 years ago

Thank you for these investigations @peristeri. Regarding the import, we had a discussion over a year ago with @poquirion and he suggested we go for "import xx as xx" instead of "from xx import xx". @poquirion any comments?

Regarding the short term solution: how do you suggest we implement it? Should we check in the bash caller if the user is on a mounted disk and spit a warning? Or should we automatically cd to the home dir and at the end of the process go back?

peristeri commented 7 years ago

Profiling should be done first on the most common modules, such as msct_image and msct_parser. And follow up with profiling the most common commands (sct_label_utils, sct_propseg, ...)

Unfortunately, pycharm's profiler is only available in the professional version.

line_profile, https://github.com/rkern/line_profiler identifies the time taken for each line and frequency called. I will begin now profiling msct_parser as a proof of concept, since it is a heavy import.

peristeri commented 7 years ago

I would add the remote drive instructions in the get started documentation.

I tried checking the pwd in the sct_launcher, but I was not able to find a reliable way identify if it is a remote drive. I will look into maybe using python's is_remote function but the solution is not a clean one.

jcohenadad commented 7 years ago

thanks George. @poquirion @perone any ideas?

poquirion commented 7 years ago

I've been looking into that a bit this morning and it seems that the problem is not specific to the sct.

I have mount a slow smb share on my laptop and when it is my working directory, vim ~/.bashrc takes forever to start.

Its probably slow every time a system call is done.

poquirion commented 7 years ago

@peristeri concerning the profiling, is the sct installed on the remote disk or is it just your working directory.

if the sct is install on the remote, the slowliness is expected...

The problem is just that calling the sct from a remote (#PWD is /remote) is slow, even when no files are created on that remote. Like I just posted, even opening a file in my home with vim is slow when I am on a slow remote.

poquirion commented 7 years ago

As for the comment of what we should do, moving right in a /tmp directory when the sct is started might kill two bird with one stone.

The system call will be faster
It will create .tmp files in a mount that is not a smb share

We will need to do that at the begining of the tool, just after the logging instantiation, in this way we will minimise the amount of '/tmp' folder created to just one per system/bash call.

poquirion commented 7 years ago

sct_propseg -h

PWD=$HOME ==> fast

real    0m0.935s
user    0m0.828s
sys 0m0.056s

PWD=$SMB_SHARE == > slow

real    0m7.399s
user    0m0.900s
sys 0m0.120s

===========================================

sct_launcher

PWD=$HOME ==> fast

real    0m0.764s
user    0m0.672s
sys 0m0.056s

PWD=$SMB_SHARE == > slow

real    0m3.184s
user    0m0.616s
sys 0m0.072s

The laucher account for about 40% on the slow share but for 80% on the fast one...

poquirion commented 7 years ago

There is the "SCT_LOADED_ONCE=1" options that accelerate the sct_launcher call (the slow part of the launcher is only ran once, an all the sct call using the laucher are much faster.) Sourcing sct_laucher has the effect of loading all the slow stuff and setting SCT_LOADED_ONCE.

From the fast mount:

$  source  ~/neuropoly/spinalcordtoolbox/bin/sct_launcher
$ time ~/neuropoly/spinalcordtoolbox/bin/sct_launcher
real    0m0.005s
user    0m0.000s
sys 0m0.004s
$ time ~/neuropoly/spinalcordtoolbox/bin/sct_propseg -h
real    0m0.249s
user    0m0.224s
sys 0m0.016s

================= From the slow mount

$  source  ~/neuropoly/spinalcordtoolbox/bin/sct_launcher
$ time ~/neuropoly/spinalcordtoolbox/bin/sct_launcher
real    0m0.087s
user    0m0.000s
sys 0m0.000s
$ time ~/neuropoly/spinalcordtoolbox/bin/sct_propseg -h
real    0m4.585s
user    0m0.256s
sys 0m0.064s

the sct_propseg -h call is still pretty slow, and its not coming from the launcher.

poquirion commented 7 years ago

Other stuff from the slow mount: Starting python is slow:

$ time python -c ''

real    0m0.864s
user    0m0.016s
sys 0m0.004s

Then

$ python -m cProfile -s cumulative  ~/neuropoly/spinalcordtoolbox/scripts/sct_propseg.py -h

        228704 function calls (227107 primitive calls) in 4.756 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.027    0.027    4.757    4.757 sct_propseg.py:14(<module>)
        1    0.001    0.001    3.157    3.157 msct_parser.py:80(<module>)
       10    0.218    0.022    2.066    0.207 __init__.py:1(<module>)
        1    0.221    0.221    1.933    1.933 msct_types.py:17(<module>)
        2    0.057    0.028    1.301    0.651 __init__.py:10(<module>)
        1    0.057    0.057    1.298    1.298 optic.py:1(<module>)
        1    0.330    0.330    1.223    1.223 sct_utils.py:16(<module>)
        1    0.053    0.053    1.181    1.181 __init__.py:106(<module>)
        1    0.000    0.000    1.004    1.004 add_newdocs.py:10(<module>)
        1    0.000    0.000    0.943    0.943 type_check.py:3(<module>)
        1    0.055    0.055    0.662    0.662 __init__.py:7(<module>)
        2    0.136    0.068    0.532    0.266 __init__.py:88(<module>)

Its seems that conclusion are the same as Georges... I will do two things,

Move to /tmp ASAP (how fast can we do that is the question) in the code and check if it is faster.
Get a better profile

poquirion commented 7 years ago

https://wiki.python.org/moin/PythonSpeed/PerformanceTips

poquirion commented 7 years ago

Added this in sct_laucher under the "loaded once" bloc:

  INIT_DIR=$PWD
  SCT_WORK_DIR=$(mktemp -d -t sct_tmp.$(date +%d%m%y).XXXXXX)
  cd $SCT_WORK_DIR

then, in the slow directory

$ time ~/neuropoly/spinalcordtoolbox/bin/sct_propseg -h
real    0m1.182s
user    0m0.808s
sys 0m0.072s

C'est ça la magie! http://www.youtube.com/watch?v=8JYktr14NBk&t=3m5s

poquirion commented 7 years ago

However, I see a couple problems...

we will need to carry an env var (here $INIT_DIR) to set all the relative path in the sct and pray that it does not break. This would need to be set of -i and -o options (and -ofolder I think). Is there any other options?
make sure that sourcing sct_laucher does not take you in the tmp dir, since sourcing sct_laucher is a developer's move.

poquirion commented 7 years ago

... In the end I am a bit uncomfortable doing the cd to /tmp in the launcher especially for what is means for resolution of relative paths. Is there unseens side effects.

poquirion commented 7 years ago

I can create a method in sct_utils that will resolve all relative paths (and also ~ if they are present) using SCT_INIT_DIR if it is present. I can also put the condition to use SCT_INIT_DIR if and only of PWD is SCT_TMP_DIR.

And will need to change all the os.path.abspath call to the method.

poquirion commented 7 years ago

After the discussion we just had about python looking in the current dir: (And this is an erratum from the a deleted post!)

I still have not find an option to prevent python to prepend $PWD in sys.path. However, I ran a test doing

import sys
sys.path.pop(INDICE OF $PWD)

as propseg first step.

With $PWD in sys.path

time ~/neuropoly/spinalcordtoolbox/scripts/sct_propseg.py -h > /dev/null 

real    0m3.840s
user    0m0.292s
sys 0m0.056s

Without $PWD in sys.path

time ~/neuropoly/spinalcordtoolbox/scripts/sct_propseg.py -h > /dev/null 
real    0m1.448s
user    0m0.212s
sys 0m0.048s

The difference is not neglectable. However, the possition $PWD is not always the same, depending on how the sct is set... form the doc: https://docs.python.org/3/library/sys.html#sys.path

poquirion commented 7 years ago

A not too disruptive way of removing $PWD is to do it at the top of sct_util

import os
import sys

cwd = os.getcwd()
if cwd in sys.path:
    sys.path.remove(cwd)
print ("\n".join(sys.path))

jcohenadad commented 7 years ago

very interesting, how does the last timing you obtain compares to the local call?

poquirion commented 7 years ago

Ok, I think I got a reasonable solution. I am just moving $PWD to the end of the search path. The speed is still pretty good when doing that.

poquirion commented 7 years ago

Here, $PWD is the slow path

All paths in

real    0m4.120s
user    0m0.280s
sys 0m0.040s

removing slow path

real    0m1.438s
user    0m0.228s
sys 0m0.036s

putting slow path at the end of sys.path

real    0m1.947s
user    0m0.228s
sys 0m0.048s

But I also got real up to 3s from time to time.

Cd in /tmp on the other hand gives:

real    0m1.376s
user    0m1.020s
sys 0m0.060s

Put $PWD at the end of the search path, and add a note about that in the sct is probably the less disruptive way of doing this... We should also log a warning when doing so.

poquirion commented 7 years ago

Removing the slow path

$ python -m cProfile -s cumulative  ~/neuropoly/spinalcordtoolbox/scripts/sct_propseg.py -h
         228713 function calls (227116 primitive calls) in 3.321 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.777    0.777    3.322    3.322 sct_propseg.py:13(<module>)
        1    0.007    0.007    1.639    1.639 msct_parser.py:80(<module>)
       10    0.120    0.012    1.044    0.104 __init__.py:1(<module>)
        1    0.104    0.104    0.956    0.956 msct_types.py:17(<module>)
        2    0.027    0.013    0.751    0.376 __init__.py:10(<module>)
        1    0.025    0.025    0.751    0.751 optic.py:1(<module>)
        1    0.164    0.164    0.675    0.675 sct_utils.py:15(<module>)
        1    0.028    0.028    0.598    0.598 __init__.py:106(<module>)
        1    0.000    0.000    0.509    0.509 add_newdocs.py:10(<module>)
        1    0.000    0.000    0.477    0.477 type_check.py:3(<module>)
        1    0.001    0.001    0.360    0.360 analyze.py:83(<module>)
        1    0.026    0.026    0.323    0.323 __init__.py:7(<module>)
[...]

Putting it at the end

$ python -m cProfile -s cumulative  ~/neuropoly/spinalcordtoolbox/scripts/sct_propseg.py -h
         228714 function calls (227117 primitive calls) in 2.922 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.031    0.031    2.923    2.923 sct_propseg.py:13(<module>)
        1    0.007    0.007    1.676    1.676 msct_parser.py:80(<module>)
       10    0.148    0.015    1.166    0.117 __init__.py:1(<module>)
        1    0.106    0.106    1.040    1.040 msct_types.py:17(<module>)
        2    0.059    0.030    1.026    0.513 __init__.py:10(<module>)
        1    0.028    0.028    0.995    0.995 optic.py:1(<module>)
        1    0.026    0.026    0.675    0.675 __init__.py:106(<module>)
        1    0.165    0.165    0.630    0.630 sct_utils.py:15(<module>)
        1    0.000    0.000    0.557    0.557 add_newdocs.py:10(<module>)
        1    0.000    0.000    0.523    0.523 type_check.py:3(<module>)
        1    0.001    0.001    0.411    0.411 analyze.py:83(<module>)
        1    0.026    0.026    0.340    0.340 __init__.py:7(<module>)

[...]

jcohenadad commented 7 years ago

cool! can you also give comparative times with the call from /tmp ?

zougloub commented 6 years ago

First, the "unwanted" behaviour is the same on all Python programs that would be stored on a slow/high-latency filesystem, this is the way Python works: sys.path contains the path of the executed scripts, and imports will be tried from there first (https://docs.python.org/3/library/sys.html#sys.path).

IMHO the cleanest solution is code-less, and is to (perhaps educate to) install SCT on one's local storage; what the PR does is at least contained within the scripts, so it's not too bad, but it's still ad-hoc, and partial, since SCT internal imports will still have to be performed on the slow network.

jcohenadad commented 6 years ago

@zougloub: in this specific issue, SCT is installed on one's local storage. Then the user goes into a remote disk, runs SCT and experiences this latency.

zougloub commented 6 years ago

Ah.

Well, here's some analysis. After checking that invoking Python directly didn't cause issues, the following experiment was done and the extra accesses to try and import in the working directory are caused by the launcher, which causes addition of the current working directory to sys.path:

# create a recognizable working directory
mkdir ~/my_sct_workdir && cd ~/my_sct_workdir
# create a useless script that does nothing
echo 'import sys; print("my path: %s" % sys.path)' > $SCT_DIR/scripts/sct_fake.py
# create a wrapper for it
ln -sf sct_propseg $SCT_DIR/bin/sct_fake
# trace direct script invocation
( strace -e trace=open,stat,readdir,fork,vfork,execve,clone python -u -v $SCT_DIR/scripts/sct_fake.py 2>&1 ) > 0_fake_just_python.txt
# trace script invocation with bundled interpreter
( strace -e trace=open,stat,readdir,fork,vfork,execve,clone $SCT_DIR/dev/python/bin/python -u -v $SCT_DIR/scripts/sct_fake.py 2>&1 ) > 1_fake_conda_python.txt
# trace script invocation with launcher, altered to do python -u -v
( strace -e trace=open,stat,readdir,fork,vfork,execve,clone -f sct_fake 2>&1 ) > 2_fake_with_launcher

# check when the workdir is accessed in the 3 tests
for x in 0 1 2; do grep my_sct_workdir  ${x}_fake_* | awk '{ FS=":"; print $1 }' | wc -l; done
0
0
80

# check sys.path
grep "my path" ?_fake_*
0_fake_just_python.txt:my path: ['/home/cJ/sct_dev/scripts', '/usr/lib64/python27.zip', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk', '/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/home/cJ/.local/lib64/python2.7/site-packages', '/usr/lib64/python2.7/site-packages', '/usr/lib64/python2.7/site-packages/geos', '/usr/lib64/python2.7/site-packages/gtk-2.0', '/usr/lib64/python2.7/site-packages/wx-3.0-gtk2']
1_fake_conda_python.txt:my path: ['/home/cJ/sct_dev/scripts', '/home/cJ/sct_dev/python/lib/python27.zip', '/home/cJ/sct_dev/python/lib/python2.7', '/home/cJ/sct_dev/python/lib/python2.7/plat-linux2', '/home/cJ/sct_dev/python/lib/python2.7/lib-tk', '/home/cJ/sct_dev/python/lib/python2.7/lib-old', '/home/cJ/sct_dev/python/lib/python2.7/lib-dynload', '/home/cJ/sct_dev/python/lib/python2.7/site-packages', '/home/cJ/sct_dev/python/lib/python2.7/site-packages/setuptools-23.0.0-py2.7.egg']
2_fake_with_launcher:my path: ['/home/cJ/sct_dev/scripts', '/home/cJ/sct_dev/scripts', '/home/cJ/my_sct_workdir', '/home/cJ/sct_dev/python/lib/python27.zip', '/home/cJ/sct_dev/python/lib/python2.7', '/home/cJ/sct_dev/python/lib/python2.7/plat-linux2', '/home/cJ/sct_dev/python/lib/python2.7/lib-tk', '/home/cJ/sct_dev/python/lib/python2.7/lib-old', '/home/cJ/sct_dev/python/lib/python2.7/lib-dynload', '/home/cJ/sct_dev/python/lib/python2.7/site-packages', '/home/cJ/sct_dev/python/lib/python2.7/site-packages/setuptools-23.0.0-py2.7.egg']

Will track down where this comes from...

zougloub commented 6 years ago

Got it. This is caused by the launcher's add_to_path function which is appending a colon without checking, going from (nothing) to ${SCT_DIR}: , which is actually 2 entries: ${SCT_DIR} and ..

zougloub commented 6 years ago

Since the scripts are actually located in $SCT_DIR/scripts, adding this folder to PYTHONPATH is useless. Thus the creation of PYTHONPATH could be forgone altogether.

poquirion commented 6 years ago

I fixed the bug in add_to_path, and got rid of adding script to the PYTHONPATH

jcohenadad commented 6 years ago

tried running sct_propseg on django using cf64e93d55cadd6ad3710dfbe7d847784ed57cbc, from a directory within duke, and there was no abnormal delay. So it seems the issue has now been fixed. We need more confirmations from other lab members.

spinalcordtoolbox / spinalcordtoolbox

launcher VERY slow when run from remote disk #838