ratt-ru / montblanc

GPU-accelerated RIME implementations. An offshoot of the BIRO projects, and one of the foothills of Mt Exaflop.
Other
10 stars 3 forks source link

build system fails on python 3 #270

Closed bennahugo closed 8 months ago

bennahugo commented 5 years ago

Starting from a clean slate the build system is not working on python3:

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-req-build-aoc6r8pe/setup.py", line 224, in <module>
        zip_safe=False)
      File "/home/bhugo/workspace/montblanc/venv3/lib/python3.6/site-packages/setuptools/__init__.py", line 145, in setup
        return distutils.core.setup(**attrs)
      File "/usr/lib/python3.6/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/usr/lib/python3.6/distutils/dist.py", line 955, in run_commands
        self.run_command(cmd)
      File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/home/bhugo/workspace/montblanc/venv3/lib/python3.6/site-packages/setuptools/command/install.py", line 61, in run
        return orig.install.run(self)
      File "/usr/lib/python3.6/distutils/command/install.py", line 589, in run
        self.run_command('build')
      File "/usr/lib/python3.6/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/usr/lib/python3.6/distutils/command/build.py", line 135, in run
        self.run_command(cmd_name)
      File "/usr/lib/python3.6/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/tmp/pip-req-build-aoc6r8pe/install/tensorflow_ops_ext.py", line 181, in run
        setattr(e, n, v)
    AttributeError: attribute '__weakref__' of 'Extension' objects is not writable
bennahugo commented 5 years ago

So this bug is specifically a distutils and setuputils bug with python3 under the latest setuputils

bennahugo commented 5 years ago

Looks like there is no easy way around the installation issue. For now users will just have to install tensorflow themselves

bennahugo commented 5 years ago

Fixing that the build system fails deep within distutils:

temp.linux-x86_64-3.6/montblanc/impl/rime/tensorflow/rime_ops/constants.o build/temp.linux-x86_64-3.6/montblanc/impl/rime/tensorflow/rime_ops/e_beam_op_cpu.o build/temp.linux-x86_64-3.6/montblanc/impl/rime/tensorflow/rime_ops/gauss_shape_op_cpu.o build/temp.linux-x86_64-3.6/montblanc/impl/rime/tensorflow/rime_ops/parallactic_angle_sin_cos_op_cpu.o build/temp.linux-x86_64-3.6/montblanc/impl/rime/tensorflow/rime_ops/sum_coherencies_op_cpu.o build/temp.linux-x86_64-3.6/montblanc/impl/rime/tensorflow/rime_ops/feed_rotation_op_cpu.o build/temp.linux-x86_64-3.6/montblanc/impl/rime/tensorflow/rime_ops/b_sqrt_op_cpu.o build/temp.linux-x86_64-3.6/montblanc/impl/rime/tensorflow/rime_ops/sersic_shape_op_cpu.o build/temp.linux-x86_64-3.6/montblanc/impl/rime/tensorflow/rime_ops/post_process_visibilities_op_cpu.o build/temp.linux-x86_64-3.6/montblanc/impl/rime/tensorflow/rime_ops/phase_op_cpu.o -L/home/bhugo/workspace/venv3/lib/python3.6/site-packages/tensorflow -ltensorflow_framework -o build/lib.linux-x86_64-3.6/montblanc/ext/rime.cpython-36m-x86_64-linux-gnu.so -fPIC -fopenmp -g0
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-req-build-21smlp0c/setup.py", line 228, in <module>
        zip_safe=False)
      File "/home/bhugo/workspace/venv3/lib/python3.6/site-packages/setuptools/__init__.py", line 145, in setup
        return distutils.core.setup(**attrs)
      File "/usr/lib/python3.6/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/usr/lib/python3.6/distutils/dist.py", line 955, in run_commands
        self.run_command(cmd)
      File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/home/bhugo/workspace/venv3/lib/python3.6/site-packages/setuptools/command/install.py", line 61, in run
        return orig.install.run(self)
      File "/usr/lib/python3.6/distutils/command/install.py", line 589, in run
        self.run_command('build')
      File "/usr/lib/python3.6/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/usr/lib/python3.6/distutils/command/build.py", line 135, in run
        self.run_command(cmd_name)
      File "/usr/lib/python3.6/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/tmp/pip-req-build-21smlp0c/install/tensorflow_ops_ext.py", line 186, in run
        build_ext.run(self)
      File "/home/bhugo/workspace/venv3/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 78, in run
        _build_ext.run(self)
      File "/usr/lib/python3.6/distutils/command/build_ext.py", line 339, in run
        self.build_extensions()
      File "/tmp/pip-req-build-21smlp0c/install/tensorflow_ops_ext.py", line 191, in build_extensions
        build_ext.build_extensions(self)
      File "/usr/lib/python3.6/distutils/command/build_ext.py", line 448, in build_extensions
        self._build_extensions_serial()
      File "/usr/lib/python3.6/distutils/command/build_ext.py", line 473, in _build_extensions_serial
        self.build_extension(ext)
      File "/home/bhugo/workspace/venv3/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 200, in build_extension
        if ext._needs_stub:
    AttributeError: 'Extension' object has no attribute '_needs_stub'

Even after first ensuring tensorflow is installed.

sjperkins commented 5 years ago

99% sure its the delayed tensorflow extension build that causes this. If the custom build_ext was removed and create_tensorflow_ext was called in setup.py prior to the setup call this would not be an issue.

bennahugo commented 5 years ago

I've ported over the install scripts from master to check and applied the same minor fixes. They also don't work so it looks like a bonafide bug with distutils and extension modules

sjperkins commented 5 years ago

I disagree. I'm pretty sure the issue stems from here:

https://github.com/ska-sa/montblanc/blob/aaf23809c890de13e76e18bf69a174236b4eb5bb/install/tensorflow_ops_ext.py#L168-L187

The extension build is delayed till the run method in the custom build_ext. To make this work, attributes are copied over from a dummy extension into the just build extension.

I think the proper solution to this is:

  1. Remove the custom build_ext and the delayed Extension build from setup.py
  2. Use the new build system specified in PEP 517 and 518 to specify tensorflow as a build dependency in pyproject.toml
  3. Create the tensorflow extension as normal in setup.py

But I haven't had time to test this out yet.

Working on ansible today, I could probably take a closer look next week.

bennahugo commented 5 years ago

Have it working