Closed jcohenadad closed 6 years ago
How is the disk mounted, and have you tried to run other executable from the mounted disk?
Then, it is good practice to install executable on local disk. However I am surprised by how slow it is.
the disk was mounted with default Apple smb3 protocol, via VPN. I did not try other executable. I am at Poly now, and the disk is mounted within the same VLAN, using the same protocol (smb3). It is MUCH faster, however, slightly slower than the local call.
I don't know samba and smb3 very well but I am not surprised. If the disk was mounted over a home internet connection network, then to the risq, then the neuropoly vpn... The disk latency must be pretty important.
From the Error you get at ctrl-C, it is the python imports that are slow. There is a lot of file io in that phase, and since all the python libraries are on the mounted system... I am not surprise that it is that slow and the only obvious way to get around that problem is to make sure that the SCT python/ directory is on a local disk, or at least on one with low latency.
To check the latency on my linux machine I just do iostat -x
and look a the "await" output ,but I am not sure how to do that on a mac (http://apple.stackexchange.com/questions/52751/mac-os-x-disk-performance-monitoring).
ok so if there is no easy fix i guess we can remove the priority:high, and ask for users to install SCT on a local disk
might be causing issue #967
i increased the priority of this issue because with the new server we are now running processes on it, and SCT functions are slower to run. Also, this issue affects SCT users in other research labs who are using network home folders.
Make test with slow io on disk
https://unix.stackexchange.com/questions/48138/how-to-throttle-per-process-i-o-to-a-max-limit
and https://fritshoogland.wordpress.com/2012/12/15/throttling-io-with-linux/
But it might not be just disk speed, it might also be "ping time"
Here's a crude profiling of sct_propseg
using time
and remotely logged in to duke
at the path /Volumes/sct_testing/small
time ls
real 0m0.038s
user 0m0.002s
sys 0m0.004s
time sct_propseg # no changes
real 0m15.497s
user 0m1.499s
sys 0m0.430s
time sct_propseg # without the sct version printout
real 0m13.985s
user 0m1.499s
sys 0m0.400s
time sct_propseg # import all modules and functions and that's it.
real 0m14.370s
user 0m1.512s
sys 0m0.399s
time sct_propseg # no imports, only load the functions
real 0m3.432s
user 0m0.883s
sys 0m0.211s
time sct_propseg # import msct_parser only and functions
real 0m8.843s
user 0m1.111s
sys 0m0.307s
time sct_propseg # from spinalcordtoolbox.centerline import optic only and functions
real 0m13.443s
user 0m1.485s
sys 0m0.379s
time sct_propseg # import spinalcordtoolbox.centerline only and functions
real 0m3.248s
user 0m0.898s
sys 0m0.215s
The preliminary tests shows the latency is caused by:
import foo
versus from foo import bar
git rev-parse
is slow in general.import msct_parser
is slow to load.Also to note that these number depend on the connection being open. In most OS, if the connection is not used for a long period of time, it is closed until the next request. example: the first call to ls -al
is slower then the second, if the second call is done soon after.
bash-3.2$ time ls -al
...
real 0m6.543s
user 0m0.002s
sys 0m0.005s
bash-3.2$ time ls -al
total 656
...
real 0m0.812s
user 0m0.002s
sys 0m0.004s
Immediate suggestions
Use your home directory as the working directory, and used the full remote path to the files.
Short term suggestion
Profile the scripts modules and identify IO bottlenecks.
Long term suggestion
Build file system module to manage all of SCT's IO operations.
Thank you for these investigations @peristeri. Regarding the import, we had a discussion over a year ago with @poquirion and he suggested we go for "import xx as xx" instead of "from xx import xx". @poquirion any comments?
Regarding the short term solution: how do you suggest we implement it? Should we check in the bash caller if the user is on a mounted disk and spit a warning? Or should we automatically cd to the home dir and at the end of the process go back?
Profiling should be done first on the most common modules, such as msct_image and msct_parser. And follow up with profiling the most common commands (sct_label_utils, sct_propseg, ...)
Unfortunately, pycharm's profiler is only available in the professional version.
line_profile, https://github.com/rkern/line_profiler identifies the time taken for each line and frequency called. I will begin now profiling msct_parser as a proof of concept, since it is a heavy import.
I would add the remote drive instructions in the get started documentation.
I tried checking the pwd
in the sct_launcher, but I was not able to find a reliable way identify if it is a remote drive. I will look into maybe using python's is_remote
function but the solution is not a clean one.
thanks George. @poquirion @perone any ideas?
I've been looking into that a bit this morning and it seems that the problem is not specific to the sct.
I have mount a slow smb share on my laptop and when it is my working directory, vim ~/.bashrc
takes forever to start.
Its probably slow every time a system call is done.
@peristeri concerning the profiling, is the sct installed on the remote disk or is it just your working directory.
if the sct is install on the remote, the slowliness is expected...
The problem is just that calling the sct from a remote (#PWD is /remote) is slow, even when no files are created on that remote. Like I just posted, even opening a file in my home with vim is slow when I am on a slow remote.
As for the comment of what we should do, moving right in a /tmp
directory when the sct is started might kill two bird with one stone.
We will need to do that at the begining of the tool, just after the logging instantiation, in this way we will minimise the amount of '/tmp' folder created to just one per system/bash call.
sct_propseg -h
PWD=$HOME ==> fast
real 0m0.935s
user 0m0.828s
sys 0m0.056s
PWD=$SMB_SHARE == > slow
real 0m7.399s
user 0m0.900s
sys 0m0.120s
===========================================
sct_launcher
PWD=$HOME ==> fast
real 0m0.764s
user 0m0.672s
sys 0m0.056s
PWD=$SMB_SHARE == > slow
real 0m3.184s
user 0m0.616s
sys 0m0.072s
The laucher account for about 40% on the slow share but for 80% on the fast one...
There is the "SCT_LOADED_ONCE=1" options that accelerate the sct_launcher call (the slow part of the launcher is only ran once, an all the sct call using the laucher are much faster.) Sourcing sct_laucher has the effect of loading all the slow stuff and setting SCT_LOADED_ONCE.
From the fast mount:
$ source ~/neuropoly/spinalcordtoolbox/bin/sct_launcher
$ time ~/neuropoly/spinalcordtoolbox/bin/sct_launcher
real 0m0.005s
user 0m0.000s
sys 0m0.004s
$ time ~/neuropoly/spinalcordtoolbox/bin/sct_propseg -h
real 0m0.249s
user 0m0.224s
sys 0m0.016s
================= From the slow mount
$ source ~/neuropoly/spinalcordtoolbox/bin/sct_launcher
$ time ~/neuropoly/spinalcordtoolbox/bin/sct_launcher
real 0m0.087s
user 0m0.000s
sys 0m0.000s
$ time ~/neuropoly/spinalcordtoolbox/bin/sct_propseg -h
real 0m4.585s
user 0m0.256s
sys 0m0.064s
the sct_propseg -h call is still pretty slow, and its not coming from the launcher.
Other stuff from the slow mount: Starting python is slow:
$ time python -c ''
real 0m0.864s
user 0m0.016s
sys 0m0.004s
Then
$ python -m cProfile -s cumulative ~/neuropoly/spinalcordtoolbox/scripts/sct_propseg.py -h
228704 function calls (227107 primitive calls) in 4.756 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.027 0.027 4.757 4.757 sct_propseg.py:14(<module>)
1 0.001 0.001 3.157 3.157 msct_parser.py:80(<module>)
10 0.218 0.022 2.066 0.207 __init__.py:1(<module>)
1 0.221 0.221 1.933 1.933 msct_types.py:17(<module>)
2 0.057 0.028 1.301 0.651 __init__.py:10(<module>)
1 0.057 0.057 1.298 1.298 optic.py:1(<module>)
1 0.330 0.330 1.223 1.223 sct_utils.py:16(<module>)
1 0.053 0.053 1.181 1.181 __init__.py:106(<module>)
1 0.000 0.000 1.004 1.004 add_newdocs.py:10(<module>)
1 0.000 0.000 0.943 0.943 type_check.py:3(<module>)
1 0.055 0.055 0.662 0.662 __init__.py:7(<module>)
2 0.136 0.068 0.532 0.266 __init__.py:88(<module>)
Its seems that conclusion are the same as Georges... I will do two things,
Added this in sct_laucher under the "loaded once" bloc:
INIT_DIR=$PWD
SCT_WORK_DIR=$(mktemp -d -t sct_tmp.$(date +%d%m%y).XXXXXX)
cd $SCT_WORK_DIR
then, in the slow directory
$ time ~/neuropoly/spinalcordtoolbox/bin/sct_propseg -h
real 0m1.182s
user 0m0.808s
sys 0m0.072s
C'est ça la magie! http://www.youtube.com/watch?v=8JYktr14NBk&t=3m5s
However, I see a couple problems...
... In the end I am a bit uncomfortable doing the cd to /tmp
in the launcher especially for what is means for resolution of relative paths. Is there unseens side effects.
I can create a method in sct_utils that will resolve all relative paths (and also ~ if they are present) using SCT_INIT_DIR if it is present. I can also put the condition to use SCT_INIT_DIR if and only of PWD is SCT_TMP_DIR.
And will need to change all the os.path.abspath call to the method.
After the discussion we just had about python looking in the current dir: (And this is an erratum from the a deleted post!)
I still have not find an option to prevent python to prepend $PWD
in sys.path
. However, I ran a test doing
import sys
sys.path.pop(INDICE OF $PWD)
as propseg first step.
With $PWD
in sys.path
time ~/neuropoly/spinalcordtoolbox/scripts/sct_propseg.py -h > /dev/null
real 0m3.840s
user 0m0.292s
sys 0m0.056s
Without $PWD
in sys.path
time ~/neuropoly/spinalcordtoolbox/scripts/sct_propseg.py -h > /dev/null
real 0m1.448s
user 0m0.212s
sys 0m0.048s
The difference is not neglectable. However, the possition $PWD is not always the same, depending on how the sct is set... form the doc: https://docs.python.org/3/library/sys.html#sys.path
A not too disruptive way of removing $PWD is to do it at the top of sct_util
import os
import sys
cwd = os.getcwd()
if cwd in sys.path:
sys.path.remove(cwd)
print ("\n".join(sys.path))
very interesting, how does the last timing you obtain compares to the local call?
Ok, I think I got a reasonable solution. I am just moving $PWD to the end of the search path. The speed is still pretty good when doing that.
Here, $PWD
is the slow path
All paths in
real 0m4.120s
user 0m0.280s
sys 0m0.040s
removing slow path
real 0m1.438s
user 0m0.228s
sys 0m0.036s
putting slow path at the end of sys.path
real 0m1.947s
user 0m0.228s
sys 0m0.048s
But I also got real up to 3s from time to time.
Cd in /tmp on the other hand gives:
real 0m1.376s
user 0m1.020s
sys 0m0.060s
Put $PWD at the end of the search path, and add a note about that in the sct is probably the less disruptive way of doing this... We should also log a warning when doing so.
Removing the slow path
$ python -m cProfile -s cumulative ~/neuropoly/spinalcordtoolbox/scripts/sct_propseg.py -h
228713 function calls (227116 primitive calls) in 3.321 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.777 0.777 3.322 3.322 sct_propseg.py:13(<module>)
1 0.007 0.007 1.639 1.639 msct_parser.py:80(<module>)
10 0.120 0.012 1.044 0.104 __init__.py:1(<module>)
1 0.104 0.104 0.956 0.956 msct_types.py:17(<module>)
2 0.027 0.013 0.751 0.376 __init__.py:10(<module>)
1 0.025 0.025 0.751 0.751 optic.py:1(<module>)
1 0.164 0.164 0.675 0.675 sct_utils.py:15(<module>)
1 0.028 0.028 0.598 0.598 __init__.py:106(<module>)
1 0.000 0.000 0.509 0.509 add_newdocs.py:10(<module>)
1 0.000 0.000 0.477 0.477 type_check.py:3(<module>)
1 0.001 0.001 0.360 0.360 analyze.py:83(<module>)
1 0.026 0.026 0.323 0.323 __init__.py:7(<module>)
[...]
Putting it at the end
$ python -m cProfile -s cumulative ~/neuropoly/spinalcordtoolbox/scripts/sct_propseg.py -h
228714 function calls (227117 primitive calls) in 2.922 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.031 0.031 2.923 2.923 sct_propseg.py:13(<module>)
1 0.007 0.007 1.676 1.676 msct_parser.py:80(<module>)
10 0.148 0.015 1.166 0.117 __init__.py:1(<module>)
1 0.106 0.106 1.040 1.040 msct_types.py:17(<module>)
2 0.059 0.030 1.026 0.513 __init__.py:10(<module>)
1 0.028 0.028 0.995 0.995 optic.py:1(<module>)
1 0.026 0.026 0.675 0.675 __init__.py:106(<module>)
1 0.165 0.165 0.630 0.630 sct_utils.py:15(<module>)
1 0.000 0.000 0.557 0.557 add_newdocs.py:10(<module>)
1 0.000 0.000 0.523 0.523 type_check.py:3(<module>)
1 0.001 0.001 0.411 0.411 analyze.py:83(<module>)
1 0.026 0.026 0.340 0.340 __init__.py:7(<module>)
[...]
cool! can you also give comparative times with the call from /tmp ?
First, the "unwanted" behaviour is the same on all Python programs that would be stored on a slow/high-latency filesystem, this is the way Python works: sys.path contains the path of the executed scripts, and imports will be tried from there first (https://docs.python.org/3/library/sys.html#sys.path).
IMHO the cleanest solution is code-less, and is to (perhaps educate to) install SCT on one's local storage; what the PR does is at least contained within the scripts, so it's not too bad, but it's still ad-hoc, and partial, since SCT internal imports will still have to be performed on the slow network.
@zougloub: in this specific issue, SCT is installed on one's local storage. Then the user goes into a remote disk, runs SCT and experiences this latency.
Ah.
Well, here's some analysis. After checking that invoking Python directly didn't cause issues, the following experiment was done and the extra accesses to try and import in the working directory are caused by the launcher, which causes addition of the current working directory to sys.path
:
# create a recognizable working directory
mkdir ~/my_sct_workdir && cd ~/my_sct_workdir
# create a useless script that does nothing
echo 'import sys; print("my path: %s" % sys.path)' > $SCT_DIR/scripts/sct_fake.py
# create a wrapper for it
ln -sf sct_propseg $SCT_DIR/bin/sct_fake
# trace direct script invocation
( strace -e trace=open,stat,readdir,fork,vfork,execve,clone python -u -v $SCT_DIR/scripts/sct_fake.py 2>&1 ) > 0_fake_just_python.txt
# trace script invocation with bundled interpreter
( strace -e trace=open,stat,readdir,fork,vfork,execve,clone $SCT_DIR/dev/python/bin/python -u -v $SCT_DIR/scripts/sct_fake.py 2>&1 ) > 1_fake_conda_python.txt
# trace script invocation with launcher, altered to do python -u -v
( strace -e trace=open,stat,readdir,fork,vfork,execve,clone -f sct_fake 2>&1 ) > 2_fake_with_launcher
# check when the workdir is accessed in the 3 tests
for x in 0 1 2; do grep my_sct_workdir ${x}_fake_* | awk '{ FS=":"; print $1 }' | wc -l; done
0
0
80
# check sys.path
grep "my path" ?_fake_*
0_fake_just_python.txt:my path: ['/home/cJ/sct_dev/scripts', '/usr/lib64/python27.zip', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk', '/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/home/cJ/.local/lib64/python2.7/site-packages', '/usr/lib64/python2.7/site-packages', '/usr/lib64/python2.7/site-packages/geos', '/usr/lib64/python2.7/site-packages/gtk-2.0', '/usr/lib64/python2.7/site-packages/wx-3.0-gtk2']
1_fake_conda_python.txt:my path: ['/home/cJ/sct_dev/scripts', '/home/cJ/sct_dev/python/lib/python27.zip', '/home/cJ/sct_dev/python/lib/python2.7', '/home/cJ/sct_dev/python/lib/python2.7/plat-linux2', '/home/cJ/sct_dev/python/lib/python2.7/lib-tk', '/home/cJ/sct_dev/python/lib/python2.7/lib-old', '/home/cJ/sct_dev/python/lib/python2.7/lib-dynload', '/home/cJ/sct_dev/python/lib/python2.7/site-packages', '/home/cJ/sct_dev/python/lib/python2.7/site-packages/setuptools-23.0.0-py2.7.egg']
2_fake_with_launcher:my path: ['/home/cJ/sct_dev/scripts', '/home/cJ/sct_dev/scripts', '/home/cJ/my_sct_workdir', '/home/cJ/sct_dev/python/lib/python27.zip', '/home/cJ/sct_dev/python/lib/python2.7', '/home/cJ/sct_dev/python/lib/python2.7/plat-linux2', '/home/cJ/sct_dev/python/lib/python2.7/lib-tk', '/home/cJ/sct_dev/python/lib/python2.7/lib-old', '/home/cJ/sct_dev/python/lib/python2.7/lib-dynload', '/home/cJ/sct_dev/python/lib/python2.7/site-packages', '/home/cJ/sct_dev/python/lib/python2.7/site-packages/setuptools-23.0.0-py2.7.egg']
Will track down where this comes from...
Got it. This is caused by the launcher's add_to_path
function which is appending a colon without checking, going from (nothing)
to ${SCT_DIR}:
, which is actually 2 entries: ${SCT_DIR}
and .
.
Since the scripts are actually located in $SCT_DIR/scripts
, adding this folder to PYTHONPATH
is useless. Thus the creation of PYTHONPATH
could be forgone altogether.
I fixed the bug in add_to_path
, and got rid of adding script to the PYTHONPATH
tried running sct_propseg
on django using cf64e93d55cadd6ad3710dfbe7d847784ed57cbc, from a directory within duke, and there was no abnormal delay. So it seems the issue has now been fixed.
We need more confirmations from other lab members.
Implementing a solution in branch
poq_issue_838_slow_exe
see video here: https://drive.google.com/open?id=0Bx3A13n3Q_EAdVJMR0laRUZtZkE