hashdist / hashstack-old

Obsolete repository, use "hashstack" instead
3 stars 4 forks source link

[profile:matplotlib/jbmc ERROR] hit command failed #112

Closed certik closed 10 years ago

certik commented 10 years ago

I just hit this error at cloud.sagemath.com:

$ ./update
Up to date: launcher
Up to date: m4
Up to date: autoconf
Up to date: automake
Up to date: libtool
Up to date: pkgconf
Up to date: patchelf
Up to date: bzip2
Up to date: cmake
Up to date: ncurses
Up to date: zlib
Up to date: openssl
Up to date: readline
Up to date: sqlite
Up to date: python
Up to date: cython
Up to date: distribute
Up to date: freetype
Up to date: szip
Up to date: hdf5
Up to date: jinja2
Up to date: python-readline
Up to date: pyzmq
Up to date: tornado
Up to date: ipython
Up to date: lapack
Up to date: numpy
Up to date: png
Up to date: matplotlib
Up to date: matplotlib-basemap
Up to date: netcdf4
Up to date: nose
Up to date: python-netcdf4
Up to date: scipy
Building profile
[profile] Building zk5n.., follow log with:
[profile]   tail -f /mnt/home/NBgQrbd5/repos/hashstack/bld/profile-n-zk5n-1/build.log
[profile:matplotlib/jbmc ERROR] hit command failed
[profile ERROR] hit command failed

It used to work previously. I set the priority to high, because this bug prevents usage of hashstack.

Part of the bug is that the error message is not helpful --- there should be some obvious way to debug this.

ahmadia commented 10 years ago

what sort of filesystem is /mnt/home? Have you tried running the script from an IPython debugger session?

certik commented 10 years ago

I think I know what the issue is --- the /mnt/home/NBgQrbd5 part is dynamically changing when I log in the next time. I have no idea what filesystem is /mnt/home.

certik commented 10 years ago

IPython debugging session gives:

~/repos/hashstack(packages)$ ipython
Python 2.7.5 (default, Aug 15 2013, 09:07:40)
Type "copyright", "credits" or "license" for more information.

IPython 1.0.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: run ./update
Up to date: launcher
Up to date: m4
Up to date: autoconf
Up to date: automake
Up to date: libtool
Up to date: pkgconf
Up to date: patchelf
Up to date: bzip2
Up to date: cmake
Up to date: ncurses
Up to date: zlib
Up to date: openssl
Up to date: readline
Up to date: sqlite
Up to date: python
Up to date: cython
Up to date: distribute
Up to date: freetype
Up to date: szip
Up to date: hdf5
Up to date: jinja2
Up to date: python-readline
Up to date: pyzmq
Up to date: tornado
Up to date: ipython
Up to date: lapack
Up to date: numpy
Up to date: png
Up to date: matplotlib
Up to date: matplotlib-basemap
Up to date: netcdf4
Up to date: nose
Up to date: python-netcdf4
Up to date: scipy
Building profile
[profile] Building zk5n.., follow log with:
[profile]   tail -f /mnt/home/NBgQrbd5/repos/hashstack/bld/profile-n-zk5n-2/build.log
[profile:matplotlib/jbmc ERROR] hit command failed
[profile ERROR] hit command failed
An exception has occurred, use %tb to see the full traceback.

SystemExit: 127

In [2]: %tb
---------------------------------------------------------------------------
SystemExit                                Traceback (most recent call last)
/usr/local/sage/sage-5.11/local/lib/python2.7/site-packages/IPython/utils/py3compat.pyc in execfile(fname, *where)
    202             else:
    203                 filename = fname
--> 204             __builtin__.execfile(filename, *where)

/mnt/home/NBgQrbd5/repos/hashstack/update in <module>()
     18 # Rest of builder assume the python-hpcmp dir is the cwd
     19 os.chdir(root_dir)
---> 20 sys.exit(help_on_exceptions(logger, main, logger, get_hdist_config_filename()))

SystemExit: 127
ahmadia commented 10 years ago

And what's in: /mnt/home/NBgQrbd5/repos/hashstack/bld/profile-n-zk5n-2/build.log ?

certik commented 10 years ago

It ends like this:

~/repos/hashstack(packages)$ tail /mnt/home/NBgQrbd5/repos/hashstack/bld/profile-n-zk5n-2/build.log
silent_relative_symlink('/mnt/home/NBgQrbd5/repos/hashstack/opt/png/65eh/lib/pkgconfig/libpng16.pc', u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/lib/pkgconfig/libpng16.pc')
silent_makedirs(u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/share/man/man3',)
silent_relative_symlink('/mnt/home/NBgQrbd5/repos/hashstack/opt/png/65eh/share/man/man3/libpng.3', u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/share/man/man3/libpng.3')
silent_relative_symlink('/mnt/home/NBgQrbd5/repos/hashstack/opt/png/65eh/share/man/man3/libpngpf.3', u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/share/man/man3/libpngpf.3')
silent_makedirs(u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/share/man/man5',)
silent_relative_symlink('/mnt/home/NBgQrbd5/repos/hashstack/opt/png/65eh/share/man/man5/png.5', u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/share/man/man5/png.5')
Linking matplotlib/jbmc into /mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n
running ['hit', u'create-links', u'/tmp/hashdist-run-job-Vc2DXI/1_in0.json']
hit command failed
hit command failed

There don't seem to be any other errors previously.

ahmadia commented 10 years ago

That's a file system error. You might try disabling symbolic links in builder. Let me know if you want me to point out how I'm doing that in Cygwin.

On Mon, Sep 30, 2013 at 3:26 PM, Ondřej Čertík notifications@github.comwrote:

It ends like this:

~/repos/hashstack(packages)$ tail /mnt/home/NBgQrbd5/repos/hashstack/bld/profile-n-zk5n-2/build.log silent_relative_symlink('/mnt/home/NBgQrbd5/repos/hashstack/opt/png/65eh/lib/pkgconfig/libpng16.pc', u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/lib/pkgconfig/libpng16.pc') silent_makedirs(u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/share/man/man3',) silent_relative_symlink('/mnt/home/NBgQrbd5/repos/hashstack/opt/png/65eh/share/man/man3/libpng.3', u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/share/man/man3/libpng.3') silent_relative_symlink('/mnt/home/NBgQrbd5/repos/hashstack/opt/png/65eh/share/man/man3/libpngpf.3', u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/share/man/man3/libpngpf.3') silent_makedirs(u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/share/man/man5',) silent_relative_symlink('/mnt/home/NBgQrbd5/repos/hashstack/opt/png/65eh/share/man/man5/png.5', u'/mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n/share/man/man5/png.5') Linking matplotlib/jbmc into /mnt/home/NBgQrbd5/repos/hashstack/opt/profile/zk5n running ['hit', u'create-links', u'/tmp/hashdist-run-job-Vc2DXI/1_in0.json'] hit command failed hit command failed

— Reply to this email directly or view it on GitHubhttps://github.com/hashdist/hashstack/issues/112#issuecomment-25395416 .

certik commented 10 years ago

If you can point me to that, that would be awesome. What sort of filesystem error is it?

ahmadia commented 10 years ago

Almost certainly:

running ['hit', u'create-links', u'/tmp/hashdist-run-job-Vc2DXI/1_in0.json']

Is raising an IOError. We should probably catch that Exception (or provide an option to) instead of swallowing it, which might give you a little bit of a hint about what went wrong.

You can try disabling the links with the following modification:

9  builder/recipes.py View file @ 13ba05fhttps://github.com/hashdist/hashstack/blob/13ba05fc8db02adad0e56dcdef4a513830399ca3/builder/recipes.py @@ -25,14 +25,7 @@ def add_profile_install(ctx, pkg_attrs, build_spec): ] rules += [- {"action": "relative_symlink",

On Mon, Sep 30, 2013 at 3:29 PM, Ondřej Čertík notifications@github.comwrote:

If you can point me to that, that would be awesome. What sort of filesystem error is it?

— Reply to this email directly or view it on GitHubhttps://github.com/hashdist/hashstack/issues/112#issuecomment-25395662 .

ahmadia commented 10 years ago

Bleh, that didn't format well, it should look like this:

@@ -25,14 +25,7 @@ def add_profile_install(ctx, pkg_attrs, build_spec):

     rules += [
-        {"action": "relative_symlink",
-         "select": "$ARTIFACT/lib/python*/site-packages/*",
-         "prefix": "$ARTIFACT",
-         "target": "$PROFILE",
-         "dirs": True},
-        {"action": "exclude",
-         "select": "$ARTIFACT/lib/python*/site-packages/**/*"},
-        {"action": "relative_symlink",
+        {"action": "copy",
          "select": "$ARTIFACT/*/**/*",
          "prefix": "$ARTIFACT",
          "target": "$PROFILE"}
ahmadia commented 10 years ago

Weird, I think I may be seeing something similar if the destination already exists. I don't think this is high priority as long as you can workaround it, but it's definitely something I'll look at this month.

On Mon, Sep 30, 2013 at 3:35 PM, Aron Ahmadia aron@ahmadia.net wrote:

Almost certainly:

running ['hit', u'create-links', u'/tmp/hashdist-run-job- Vc2DXI/1_in0.json']

Is raising an IOError. We should probably catch that Exception (or provide an option to) instead of swallowing it, which might give you a little bit of a hint about what went wrong.

You can try disabling the links with the following modification:

9  builder/recipes.py View file @ 13ba05fhttps://github.com/hashdist/hashstack/blob/13ba05fc8db02adad0e56dcdef4a513830399ca3/builder/recipes.py @@ -25,14 +25,7 @@ def add_profile_install(ctx, pkg_attrs, build_spec): ] rules += [

  • {"action": "relative_symlink",
  • "select": "$ARTIFACT/lib/python/site-packages/",
  • "prefix": "$ARTIFACT", - "target": "$PROFILE",
  • "dirs": True}, - {"action": "exclude",
  • "select": "$ARTIFACT/lib/python_/site-packages/_/"},
  • {"action": "relativesymlink", + {"action": "copy", "select": "$ARTIFACT//_/", "prefix": "$ARTIFACT", "target": "$PROFILE"}

On Mon, Sep 30, 2013 at 3:29 PM, Ondřej Čertík notifications@github.comwrote:

If you can point me to that, that would be awesome. What sort of filesystem error is it?

— Reply to this email directly or view it on GitHubhttps://github.com/hashdist/hashstack/issues/112#issuecomment-25395662 .

certik commented 10 years ago

Ok, on my work computer I am now getting the same error:

Up to date: png
Up to date: matplotlib
Building profile
[profile] Building kpmj.., follow log with:
[profile]   tail -f /auto/netscratch/ondrej/bld/profile-n-kpmj-1/build.log
[profile:matplotlib/om73 ERROR] hit command failed
[profile ERROR] hit command failed

and

ondrej@kittiwake:~/repos/python-hpcmp2(packages)$ tail -f /auto/netscratch/ondrej/bld/profile-n-kpmj-1/build.log
silent_relative_symlink('/auto/netscratch/ondrej/opt/png/65eh/lib/pkgconfig/libpng16.pc', u'/auto/netscratch/ondrej/opt/profile/kpmj/lib/pkgconfig/libpng16.pc')
silent_makedirs(u'/auto/netscratch/ondrej/opt/profile/kpmj/share/man/man3',)
silent_relative_symlink('/auto/netscratch/ondrej/opt/png/65eh/share/man/man3/libpng.3', u'/auto/netscratch/ondrej/opt/profile/kpmj/share/man/man3/libpng.3')
silent_relative_symlink('/auto/netscratch/ondrej/opt/png/65eh/share/man/man3/libpngpf.3', u'/auto/netscratch/ondrej/opt/profile/kpmj/share/man/man3/libpngpf.3')
silent_makedirs(u'/auto/netscratch/ondrej/opt/profile/kpmj/share/man/man5',)
silent_relative_symlink('/auto/netscratch/ondrej/opt/png/65eh/share/man/man5/png.5', u'/auto/netscratch/ondrej/opt/profile/kpmj/share/man/man5/png.5')
Linking matplotlib/om73 into /auto/netscratch/ondrej/opt/profile/kpmj
running ['hit', u'create-links', u'/tmp/hashdist-run-job-5r20OA/1_in0.json']
hit command failed
hit command failed

So it has nothing to do with changing the /mnt/path.

certik commented 10 years ago

Ok, it's now rebuilding everything again with your patch. We'll see if it fixes it.

It only happens with matplotlib, not with other things...

certik commented 10 years ago

Ok, so the patch does not fix it:

[png] Unpacking sources files:tzkhasbvvydlpjkjd6plccbfv6pkcqoy
[png] Unpacking sources tar.gz:dj4va2fjpzsuvcl3usxe76jiywh6phjz
[png] Building cthf.., follow log with:
[png]   tail -f /auto/netscratch/ondrej/bld/png-n-cthf/build.log
Downloading sources for matplotlib
Building matplotlib
[matplotlib] Unpacking sources files:ywt35gj3h7ucyjgzisnqnzht64fjgx5m
[matplotlib] Unpacking sources tar.gz:klqys4vo3bptbmc455axpdwho2c56yas
[matplotlib] Building 46q2.., follow log with:
[matplotlib]   tail -f /auto/netscratch/ondrej/bld/matplotlib-n-46q2/build.log
Building profile
[profile] Building 67ki.., follow log with:
[profile]   tail -f /auto/netscratch/ondrej/bld/profile-n-67ki/build.log
[profile:python-readline/2b76 ERROR] hit command failed
[profile ERROR] hit command failed

with:

ondrej@kittiwake:~/repos/python-hpcmp2(packages)$ tail -f /auto/netscratch/ondrej/bld/profile-n-67ki/build.log
silent_copy('/auto/netscratch/ondrej/opt/jinja2/tvml/lib/python2.7/site-packages/jinja2/testsuite/utils.pyc', u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages/jinja2/testsuite/utils.pyc')
silent_copy('/auto/netscratch/ondrej/opt/jinja2/tvml/lib/python2.7/site-packages/jinja2/utils.py', u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages/jinja2/utils.py')
silent_copy('/auto/netscratch/ondrej/opt/jinja2/tvml/lib/python2.7/site-packages/jinja2/utils.pyc', u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages/jinja2/utils.pyc')
silent_copy('/auto/netscratch/ondrej/opt/jinja2/tvml/lib/python2.7/site-packages/jinja2/visitor.py', u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages/jinja2/visitor.py')
silent_copy('/auto/netscratch/ondrej/opt/jinja2/tvml/lib/python2.7/site-packages/jinja2/visitor.pyc', u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages/jinja2/visitor.pyc')
Linking python-readline/2b76 into /auto/netscratch/ondrej/opt/profile/67ki
running ['hit', u'create-links', u'/tmp/hashdist-run-job-9ZMhbL/1_in0.json']
silent_makedirs(u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages',)
hit command failed
hit command failed

So we still need to figure out a proper patch. Until then it's a high priority issue, since I can't work with hashdist anymore until I figure out a workaround. Something must have happened in the last patches, since things have been working for me perfectly before.

ahmadia commented 10 years ago

I will take a look.

A

On Wednesday, October 2, 2013, Ondřej Čertík wrote:

Ok, so the patch does not fix it:

[png] Unpacking sources files:tzkhasbvvydlpjkjd6plccbfv6pkcqoy [png] Unpacking sources tar.gz:dj4va2fjpzsuvcl3usxe76jiywh6phjz [png] Building cthf.., follow log with: [png] tail -f /auto/netscratch/ondrej/bld/png-n-cthf/build.log Downloading sources for matplotlib Building matplotlib [matplotlib] Unpacking sources files:ywt35gj3h7ucyjgzisnqnzht64fjgx5m [matplotlib] Unpacking sources tar.gz:klqys4vo3bptbmc455axpdwho2c56yas [matplotlib] Building 46q2.., follow log with: [matplotlib] tail -f /auto/netscratch/ondrej/bld/matplotlib-n-46q2/build.log Building profile [profile] Building 67ki.., follow log with: [profile] tail -f /auto/netscratch/ondrej/bld/profile-n-67ki/build.log [profile:python-readline/2b76 ERROR] hit command failed [profile ERROR] hit command failed

with:

ondrej@kittiwake:~/repos/python-hpcmp2(packages)$ tail -f /auto/netscratch/ondrej/bld/profile-n-67ki/build.log silent_copy('/auto/netscratch/ondrej/opt/jinja2/tvml/lib/python2.7/site-packages/jinja2/testsuite/utils.pyc', u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages/jinja2/testsuite/utils.pyc') silent_copy('/auto/netscratch/ondrej/opt/jinja2/tvml/lib/python2.7/site-packages/jinja2/utils.py', u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages/jinja2/utils.py') silent_copy('/auto/netscratch/ondrej/opt/jinja2/tvml/lib/python2.7/site-packages/jinja2/utils.pyc', u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages/jinja2/utils.pyc') silent_copy('/auto/netscratch/ondrej/opt/jinja2/tvml/lib/python2.7/site-packages/jinja2/visitor.py', u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages/jinja2/visitor.py') silent_copy('/auto/netscratch/ondrej/opt/jinja2/tvml/lib/python2.7/site-packages/jinja2/visitor.pyc', u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages/jinja2/visitor.pyc') Linking python-readline/2b76 into /auto/netscratch/ondrej/opt/profile/67ki running ['hit', u'create-links', u'/tmp/hashdist-run-job-9ZMhbL/1_in0.json'] silent_makedirs(u'/auto/netscratch/ondrej/opt/profile/67ki/lib/python2.7/site-packages',) hit command failed hit command failed

So we still need to figure out a proper patch.

— Reply to this email directly or view it on GitHubhttps://github.com/hashdist/hashstack/issues/112#issuecomment-25586931 .

ahmadia commented 10 years ago

Just to clarify, this is breaking when 'create-links' gets called for either png or matplotlib? To what level can you reproduce this? What commits of hashdist and hashstick are you using?

I'll try to reproduce this on my local OS X box.

certik commented 10 years ago

This is 100% reproducible on my machine. I know it fails for matplotlib. It seems to work for some other packages.

Do you have any ideas how to debug it? I can do the debugging.

ahmadia commented 10 years ago

Have you tried running:

hit create-links /tmp/hashdist-run-job-5r20OA/1_in0.json?

You could even run that in gdb or IPython for a better trace.

certik commented 10 years ago

I think I know. This patch:

diff --git a/hashdist/core/links.py b/hashdist/core/links.py
index c7cc77b..6e751a5 100644
--- a/hashdist/core/links.py
+++ b/hashdist/core/links.py
@@ -254,19 +254,26 @@ def execute_links_dsl(rules, env={}, launcher_program=None
     logger : Logger

     """
+    print "I am here"
     actions = dry_run_links_dsl(rules, env)
     for action in actions:
         action_desc = "%s%r" % (action[0].__name__, action[1:])
         try:
+            print "1"
             if action[0] is make_launcher:
                 make_launcher(*action[1:], launcher_program=launcher_program)
             else:
                 action[0](*action[1:])
+            print "2"
             logger.debug(action_desc)
+            print "3"
         except OSError, e:
             # improve error message to include operation attempted
+            print "exception 1"
             msg = str(e) + " in " + action_desc
             logger.error(msg)
             exc_type, exc_val, exc_tb = sys.exc_info()
+            print "exception 2"
             raise OSError, OSError(e.errno, msg), exc_tb
+    print "OK"

produces:

I am here
1
2
3
1
[profile:python-readline/2b76 ERROR] hit command failed
[profile ERROR] hit command failed

So the problem is in the lines:

             if action[0] is make_launcher:
                 make_launcher(*action[1:], launcher_program=launcher_program)
             else:
                 action[0](*action[1:])

I'll keep digging.

dagss commented 10 years ago

If you print action you'll have nailed it..

certik commented 10 years ago

On Thu, Oct 3, 2013 at 12:37 PM, ahmadia notifications@github.com wrote:

Have you tried running:

hit create-links /tmp/hashdist-run-job-5r20OA/1_in0.json?

You could even run that in gdb or IPython for a better trace.

The /tmp/hashdist-run-job-5r20OA/1_in0.json file does not exist, so I can't easily run it. But my "print" method debugging will get me there. O.

certik commented 10 years ago

I think I've nailed it. This patch:

diff --git a/hashdist/core/links.py b/hashdist/core/links.py
index c7cc77b..0b970ce 100644
--- a/hashdist/core/links.py
+++ b/hashdist/core/links.py
@@ -254,19 +254,29 @@ def execute_links_dsl(rules, env={}, launcher_program=None
     logger : Logger

     """
+    print "I am here"
     actions = dry_run_links_dsl(rules, env)
     for action in actions:
         action_desc = "%s%r" % (action[0].__name__, action[1:])
         try:
+            print "1"
+            print action
             if action[0] is make_launcher:
+                print "1a"
                 make_launcher(*action[1:], launcher_program=launcher_program)
             else:
+                print "1b"
                 action[0](*action[1:])
+            print "2"
             logger.debug(action_desc)
+            print "3"
         except OSError, e:
             # improve error message to include operation attempted
+            print "exception 1"
             msg = str(e) + " in " + action_desc
             logger.error(msg)
             exc_type, exc_val, exc_tb = sys.exc_info()
+            print "exception 2"
             raise OSError, OSError(e.errno, msg), exc_tb
+    print "OK"

produces

I am here
1
(<function silent_makedirs at 0x1dc48c0>, u'/auto/netscratch/ondrej/opt/profile/67ki3/lib/python2.7/site-packages')
1b
2
3
1
(<function silent_copy at 0x1dc4758>, '/auto/netscratch/ondrej/opt/python-readline/2b76/lib/python2.7/site-packages/easy-install.pth', u'/auto/netscratch/ondrej/opt/profile/67ki3/lib/python2.7/site-packages/easy-install.pth')
1b
[profile:python-readline/2b76 ERROR] hit command failed
[profile ERROR] hit command failed
ahmadia commented 10 years ago

Because easy-install.pth already exists and hit is trying to link in?

On Thu, Oct 3, 2013 at 2:40 PM, Ondřej Čertík notifications@github.comwrote:

I think I've nailed it. This patch:

diff --git a/hashdist/core/links.py b/hashdist/core/links.pyindex c7cc77b..0b970ce 100644--- a/hashdist/core/links.py+++ b/hashdist/core/links.py@@ -254,19 +254,29 @@ def execute_links_dsl(rules, env={}, launcher_program=None

 logger : Logger

 """+    print "I am here"
 actions = dry_run_links_dsl(rules, env)
 for action in actions:
     action_desc = "%s%r" % (action[0].__name__, action[1:])
     try:+            print "1"
  •      print action
    
       if action[0] is make_launcher:
  •          print "1a"
    
           make_launcher(*action[1:], launcher_program=launcher_program)
       else:
  •          print "1b"
    
           action[0](*action[1:])+            print "2"
       logger.debug(action_desc)+            print "3"
    except OSError, e:
       # improve error message to include operation attempted+            print "exception 1"
       msg = str(e) + " in " + action_desc
       logger.error(msg)
       exc_type, exc_val, exc_tb = sys.exc_info()+            print "exception 2"
       raise OSError, OSError(e.errno, msg), exc_tb+    print "OK"

produces

I am here 1 (<function silent_makedirs at 0x1dc48c0>, u'/auto/netscratch/ondrej/opt/profile/67ki3/lib/python2.7/site-packages') 1b 2 3 1 (<function silent_copy at 0x1dc4758>, '/auto/netscratch/ondrej/opt/python-readline/2b76/lib/python2.7/site-packages/easy-install.pth', u'/auto/netscratch/ondrej/opt/profile/67ki3/lib/python2.7/site-packages/easy-install.pth') 1b [profile:python-readline/2b76 ERROR] hit command failed [profile ERROR] hit command failed

— Reply to this email directly or view it on GitHubhttps://github.com/hashdist/hashstack/issues/112#issuecomment-25646701 .

ahmadia commented 10 years ago

That would be a bug :)

Also, why is /tmp getting wiped out? We should definitely have the control to never delete anything that hashdist does. Perhaps we should raise separate issues now that we've identified the problems?

certik commented 10 years ago

I think the egg for python-readline needs to be unpacked:

diff --git a/packages.yml.linux b/packages.yml.linux
index 5672fd0..fe859ca 100644
--- a/packages.yml.linux
+++ b/packages.yml.linux
@@ -20,6 +20,7 @@
   recipe: distutils
   url: https://pypi.python.org/packages/source/r/readline/readline-6.2.4.1.tar.
   key: tar.gz:4ahynyb57zjopukqftwfyzahbmzgehef
+  unpack_egg: true
   deps: [python, distribute]

 - package: pyzmq

Then it works!!!

certik commented 10 years ago

Ok. So the python-readline package was broken, because it provided duplicated files (easy-install.pth, site.py, ...):

$ ll /auto/netscratch/ondrej/opt/python-readline/2b76/lib/python2.7/site-packages/
total 24
dr-xr-xr-x 3 ondrej cnls 4096 Oct  2 17:22 ./
dr-xr-xr-x 3 ondrej cnls 4096 Oct  2 17:21 ../
-r--r--r-- 1 ondrej cnls  227 Oct  2 17:22 easy-install.pth
dr-xr-xr-x 3 ondrej cnls 4096 Oct  2 17:22 readline-6.2.4.1-py2.7-linux-x86_64.egg/
-r--r--r-- 1 ondrej cnls 2418 Oct  2 17:22 site.py
-r--r--r-- 1 ondrej cnls 1815 Oct  2 17:22 site.pyc

After the fix:

$ ll /netscratch/ondrej/opt/python-readline/tyko/lib/python2.7/site-packages/
total 728
dr-xr-xr-x 3 ondrej cnls   4096 Oct  3 12:44 ./
dr-xr-xr-x 3 ondrej cnls   4096 Oct  3 12:43 ../
dr-xr-xr-x 2 ondrej cnls   4096 Oct  3 12:44 readline-6.2.4.1-py2.7.egg-info/
-r-xr-xr-x 1 ondrej cnls 729511 Oct  3 12:44 readline.so*
ahmadia commented 10 years ago

Why would the readline installer try to hijack easy-install.pth or site.py? What are the contents of those files? I can only assume that it creates both of them if they don't exist (probably unintentionally).

Anyway, thanks for chasing this one down @certik and sorry I wasn't more help.

On Thu, Oct 3, 2013 at 2:49 PM, Ondřej Čertík notifications@github.comwrote:

Ok. So the python-readline package was broken, because it provided duplicated files (easy-install.pth, site.py, ...):

$ ll /auto/netscratch/ondrej/opt/python-readline/2b76/lib/python2.7/site-packages/ total 24 dr-xr-xr-x 3 ondrej cnls 4096 Oct 2 17:22 ./ dr-xr-xr-x 3 ondrej cnls 4096 Oct 2 17:21 ../ -r--r--r-- 1 ondrej cnls 227 Oct 2 17:22 easy-install.pth dr-xr-xr-x 3 ondrej cnls 4096 Oct 2 17:22 readline-6.2.4.1-py2.7-linux-x86_64.egg/ -r--r--r-- 1 ondrej cnls 2418 Oct 2 17:22 site.py -r--r--r-- 1 ondrej cnls 1815 Oct 2 17:22 site.pyc

After the fix:

$ ll /netscratch/ondrej/opt/python-readline/tyko/lib/python2.7/site-packages/ total 728 dr-xr-xr-x 3 ondrej cnls 4096 Oct 3 12:44 ./ dr-xr-xr-x 3 ondrej cnls 4096 Oct 3 12:43 ../ dr-xr-xr-x 2 ondrej cnls 4096 Oct 3 12:44 readline-6.2.4.1-py2.7.egg-info/ -r-xr-xr-x 1 ondrej cnls 729511 Oct 3 12:44 readline.so*

— Reply to this email directly or view it on GitHubhttps://github.com/hashdist/hashstack/issues/112#issuecomment-25647447 .

certik commented 10 years ago

Why would the readline installer try to hijack easy-install.pth or site.py?

That's done by setuptools, resp. distribute, so that the egg can be imported by Python automagically.

What are the contents of those files?

Just import hooks for this specific package. So each setuptools package has a specific hook in it --- and so it only works if you install things into an existing profile using setup.py, because the contents gets properly added to the single site.py file... But if you install like us, then the only sane way is to unpack the egg and use the old style import.

I can only assume that it creates both of them if they don't exist (probably unintentionally).

No, it is very intentional.

But when I remove the "copy" hack, i.e. use symlinks, then it still fails with:

I am here
[profile:matplotlib/om73 ERROR] hit command failed
[profile ERROR] hit command failed
certik commented 10 years ago

So for the last problem, we need to apply:

--- a/hashdist/core/links.py
+++ b/hashdist/core/links.py
@@ -219,16 +219,24 @@ def dry_run_links_dsl(rules, env={}):
         where `func` is one of `os.symlink`, :func:`silent_makedirs`,
         `shutil.copyfile`.
     """
+    print "X1"
     assert os.path.sep == '/'
+    print "X2"
     actions = []
     excluded = set()
     makedirs_cache = set()
+    print "X3"
     for rule in rules:
+        print "X4"
         if 'select' in rule:
+            print "X5a"
             _glob_actions(rule, excluded, makedirs_cache, env, actions)
         else:
+            print "X5b"
             _single_action(rule, excluded, makedirs_cache, env, actions)
+        print "X6"

+    print "X7"
     return actions

and we get

I am here
X1
X2
X3
X4
X5a
[profile:matplotlib/om73 ERROR] hit command failed
[profile ERROR] hit command failed

So this line fails:

            _glob_actions(rule, excluded, makedirs_cache, env, actions)
ahmadia commented 10 years ago

Python Eggs were such a terrible idea...

I assume you are doing a similar egg-install with matplotlib?

On Thu, Oct 3, 2013 at 2:55 PM, Ondřej Čertík notifications@github.comwrote:

Why would the readline installer try to hijack easy-install.pth or site.py?

That's done by setuptools, resp. distribute, so that the egg can be imported by Python automagically.

What are the contents of those files?

Just import hooks for this specific package. So each setuptools package has a specific hook in it --- and so it only works if you install things into an existing profile using setup.py, because the contents get properly added to the single site.py file... But if you install like us, the only sane way is to unpack the egg and use the old style import.

I can only assume that it creates both of them if they don't exist (probably unintentionally).

No, it is very intentional.

But when I remove the "copy" hack, i.e. use symlinks, then it still fails with:

I am here [profile:matplotlib/om73 ERROR] hit command failed [profile ERROR] hit command failed

— Reply to this email directly or view it on GitHubhttps://github.com/hashdist/hashstack/issues/112#issuecomment-25647946 .

certik commented 10 years ago

We unpack all eggs. That way one can install things like mayavi and so on. Matplotlib does not use eggs.

ahmadia commented 10 years ago

I tend to only install from source. I don't understand why Mayavi would be an exception.

On Thu, Oct 3, 2013 at 3:03 PM, Ondřej Čertík notifications@github.comwrote:

We unpack all eggs. That way one can install things like mayavi and so on. Matplotlib does not use eggs.

— Reply to this email directly or view it on GitHubhttps://github.com/hashdist/hashstack/issues/112#issuecomment-25648540 .

certik commented 10 years ago

Mayavi and related packages use eggs. You can't install eggs with hashdist. That is unless you use the --copy flag to ./update... ;)

certik commented 10 years ago

Ok, how do I enable exception printing in "hit"? It's a pain to debug it...

Now it fails on this line:

        selected.update(ant_iglob(pattern, '', include_dirs=rule.get('dirs', False)))

and pattern is /auto/netscratch/ondrej/opt/matplotlib/om73/lib/python*/site-packages/mpl_toolkits/**. But I don't know what exception it raises...

certik commented 10 years ago

It raises:

*** ValueError: ValueError('does not make sense with ** at end of pattern with glob_files',)
ahmadia commented 10 years ago

We can add egg support, but I don't consider it essential right now. I'm not aware of packages that are distributed as eggs where you can't get the source.

On Thu, Oct 3, 2013 at 3:06 PM, Ondřej Čertík notifications@github.comwrote:

Mayavi and related packages use eggs. You can't install eggs with hashdist.

— Reply to this email directly or view it on GitHubhttps://github.com/hashdist/hashstack/issues/112#issuecomment-25648825 .

dagss commented 10 years ago

You want ....mpl_toolkits/**/*.

certik commented 10 years ago

Yeah, I just realized:

diff --git a/builder/recipes.py b/builder/recipes.py
index 3140720..121abdf 100644
--- a/builder/recipes.py
+++ b/builder/recipes.py
@@ -26,11 +26,11 @@ def add_profile_install(ctx, pkg_attrs, build_spec):

     rules += [
         {"action": "relative_symlink",
-         "select": "$ARTIFACT/lib/python*/site-packages/mpl_toolkits/**",
+         "select": "$ARTIFACT/lib/python*/site-packages/mpl_toolkits/**/*",
          "prefix": "$ARTIFACT",
          "target": "$PROFILE"},
         {"action": "exclude",
-         "select": "$ARTIFACT/lib/python*/site-packages/mpl_toolkits/**"},
+         "select": "$ARTIFACT/lib/python*/site-packages/mpl_toolkits/**/*"},
         {"action": "relative_symlink",
          "select": "$ARTIFACT/lib/python*/site-packages/*",
          "prefix": "$ARTIFACT",

@dagss --- how do we enable proper exception printing? At least to a log file. This debugging is a madness.

dagss commented 10 years ago

Some places there's more printing with DEBUG=1.

In general, adding patches to do more printing is fair game. There's many places to add such printing, I don't know if it makes sense that I try to anticipate it, it's much easier if you add the printing where you need it to be.

certik commented 10 years ago

There should be a printing to a log file when exception occurs and this log file should stay around. Currently the exception gets swallowed.

ahmadia commented 10 years ago

Agreed on logging swallowed exceptions.

On Thu, Oct 3, 2013 at 3:29 PM, Ondřej Čertík notifications@github.comwrote:

There should be a printing to a log file when exception occurs and this log file should stay around. Currently the exception gets swallowed.

— Reply to this email directly or view it on GitHubhttps://github.com/hashdist/hashstack/issues/112#issuecomment-25650592 .

certik commented 10 years ago

See #113 for the exceptions logging.

This issue has been fixed by ec04a41f24d749275f9908a2c0ba94839e542791 and aebffc552a8f61ec2f16d1ddea501a87bc5a80ec.