ioam / topographica

A general-purpose neural simulator focusing on topographic maps.
topographica.org
BSD 3-Clause "New" or "Revised" License
53 stars 32 forks source link

Question: Inline-optimized components are currently disabled ? #624

Open dancehours opened 9 years ago

dancehours commented 9 years ago

OS : open SUSE 13.2 Version of topographica : git installed at May 3rd

Question:

I try to test using topographica in a computer named frontend01 in cluster. I created a virtual environment called [env_cluster] using Python 2.7.9 in cluster, modules including numpy, PIL,scipy ipython matplotlib are available in cluster, but gmpy is not. I git topographica and git submodule update --init in the file of "topographica" in local computer and then copy it to the [env_cluster]. Then I run a script of gcal model in this path :

[env_cluster] frontend01 env_cluster/topographica> ./topographica models/stevens.jn13/goodgcal.ty

then it shows:

WARNING:root:main: gmpy.mpq not available; using slower fixedpoint.FixedPoint for simulation time. cc1plus: error: unrecognized command line option "-Wno-cpp" cc1plus: error: unrecognized command line option "-Wno-cpp" Caution: Unable to use Weave to compile: "error: Command "g++ -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/nld/python-2.7.9-cluster/lib/python2.7/site-packages/scipy/weave -I/usr/nld/python-2.7.9-cluster/lib/python2.7/site-packages/scipy/weave/scxx -I/usr/nld/python-2.7.9-cluster/lib/python2.7/site-packages/numpy/core/include -I/usr/nld/python-2.7.9-cluster/include/python2.7 -c /home/wenqi/.cache/scipy/python27_compiled/sc_2cce12889b0767e709c4dd97c81c50a00.cpp -o /tmp/scipy-wenqi-VLO_yn/python27_intermediate/compiler_90cea13fa51630c2fec2b5f7dd9bc81a/home/wenqi/.cache/scipy/python27_compiled/sc_2cce12889b0767e709c4dd97c81c50a00.o -O2 -Wno-unused-variable -fomit-frame-pointer -funroll-loops -Wno-cpp -fopenmp" failed with exit status 1". Will use non-optimized versions of most components. Note: Inline-optimized components are currently disabled; see topo.misc.inlinec

Consequently, the running is very slow.

dancehours commented 9 years ago

I can import scipy and weave with topographica, but I am confused by why it does not work well in this running.

jbednar commented 9 years ago

Missing gmpy should only slow things down by around 15%, so you can get by without it. Weave is much more important, so it's worth getting that to work.

It looks like the -Wno-cpp option was added in GCC 4.6.4, released in 2010. If you are using a version of GCC older than that, you'll need to comment out this line of topo/misc/inlinec.py:

   inline_named_params['extra_compile_args'].append('-Wno-cpp')

All that line does is disable some warnings that Weave makes NumPy report, so it can be safely disabled, but we put that line in there because the warnings were scaring some users. Unfortunately both Weave and NumPy are out of our control, so we can't fix the underlying issue that causes the warnings. Hopefully this will fix your problem!

Also note that on cluster systems, you should be sure to run your code on one of the nodes once, before launching a bunch of concurrent jobs, because by default weave generates only a single copy of the compiled code for everything sharing the same filesystem. If you try to launch a large set of concurrent jobs, weave will often try to compile that many copies of the code at the same time, overwriting each of them with partially compiled versions from another node, which get everything very confused. As long as you run a single copy of your simulation alone first, later concurrent jobs should use the compiled code as-is and not run into such problems.

dancehours commented 9 years ago

Dear Bednar,

since you told me how to run many jobs in cluster last time, I have been trying to do that but have not been very successful. The job I want to do is run gcal script with different seeds for X,Y and orientation in guassian inputs. The previous problem with weave, as I reported above, was solved by your suggestion. Then firstly I submit single job in cluster using this command:

qsub -q frigg.q  -o  /scratch02/wenqi/output  -e  /scratch02/wenqi/error  -N  myjob  -V -b  y  python /home/wenqi/env_cluster/topographica/gcal1.py  -seed1 10 -seed2 10 -seed3 110

It works well without weave problem. But when I run the python file to submit many jobs using :

import os 
print('You are currently on ' + os.getenv('HOSTNAME') )
output_path = '/scratch02/wenqi/output'
error_path = '/scratch02/wenqi/error'
path_to_executable = '/home/wenqi/env_cluster/topographica/gcal1.py'

for i in range(10,260,50):

   for j in range(10,260,50):

        for k in range(10,260,50):

            name = 'mapjob'  
            command = 'qsub -q frigg.q -o ' + output_path + ' -e ' +error_path+ ' -N ' + name + ' -V -b y python '+ path_to_executable +' -seed1 '+str(i)+' -seed2 '+str(j)+' -seed3 '+str(k)
            # -b y : handle command as binary? -> yes
            # -V : export all environment variables.
            print command
            os.system(command)

Then some jobs stop working, others work very slowly with reporting :

Caution: Unable to use Weave to compile: "invalid syntax (, line 1)". Will use non-optimized versions of most components. Note: Inline-optimized components are currently disabled; see topo.misc.inlinec

The peculiar thing is also : afterwards, when I submit a single job I get the same error, but before it works well. I spent a lot on these problems but have not figured them out.

Ps: in the gcal script, I put some lines in the beginning:

#!/usr/bin/env python
#  to activate the virtual enviroment

activate_this = '/home/wenqi/env_cluster/bin/activate_this.py'
execfile(activate_this, dict(__file__=activate_this))

# activate the topographica
try:   import external
except: pass
import  sys, topo
if not (sys.version_info[0] == 2 and sys.version_info[1] == 7):
    print "Warning: Topographica requires Python 2.7, and will not currently work with any other version."
    sys.exit()
# Process the command-line arguments
from topo.misc.commandline import process_argv

# argparse the parameters, seed1, seed2, seed3, with python
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('-seed1')
parser.add_argument('-seed2')
parser.add_argument('-seed3')
args = parser.parse_args()
jbednar commented 9 years ago

It sounds to me like you're having race issues because of weave not being safe for parallel execution. I think that @jlstevens put in delays before launching jobs when he did it with Lancet; maybe he can comment.

@philippjfr: Presumably the problem would be solved by moving to Cython instead; not sure whether you are planning that imminently. I do know you have a lot of other things on your plate at the moment!

jlstevens commented 9 years ago

I think that @jlstevens put in delays before launching jobs when he did it with Lancet

That is correct and it felt dirty doing that! From the look of it @wenqi2015 isn't using Lancet to launch jobs so I will just say that for concurrent execution on the same machine, I did have to add sleeps before launching each topographica process

On a cluster, I have never noticed jobs failing because two jobs start on the same node at exactly the same time (the combination of the job scheduling system and other uses adds jitter) but there it is certainly possible for topographica jobs to fail for this reason!

dancehours commented 9 years ago

Thank you so much ! Sorry for so late response since recently I left launching cluster jobs and I haven't tried the ways you suggested, but I would like to do that soon.