mattjj / pyhsmm

MIT License
546 stars 173 forks source link

Fixup for pypi distribution of pyhsmm #35

Closed yarden closed 9 years ago

yarden commented 9 years ago

I missed one wrinkle with pypi distribution. The previous version didn't include the *.pyx files in the distribution, and so the package from pypi would install correctly but would not compile the Cython files.

For now, I included the Cython files and associated headers through MANIFEST.in. This will make it so when you install the package from pypi, the installation procedure automatically compiles the Cython files. A more ideal solution would be to write a conditional in the setup.py that checks if Cython is present, and if so, compiles the Cython files. For developers who want to make a distribution with python setup.py sdist, only the Cython-generated *.c files would go in the distribution, and then users could install with just Python and a C compiler, without needing Cython. (This more ideal setup is described here: http://stackoverflow.com/questions/4505747/how-should-i-structure-a-python-package-that-contains-cython-code). Since your users are mostly expert developers who probably have Cython, I figured it's not pressing.

One caveat to watch out for: if you compile your pyx files to make .c files, and have something that includes all .c files in MANIFEST.in, you could accidentally package a Cython generated, e.g. foo.c along with a foo.pyx file. That could might create a situation where only the foo.c gets used even if it's out of date with the foo.pyx file. So to be on safe side, best to wipe out all Cython generated files before making a source distribution, i.e. before running python setup.py sdist.

I fixed this in pybasicbayes too, see other pull request.

It'd be great if you could make a fresh source distribution and upload it to pypi. You could wipe out the previous one through the pypi admin settings, or upgrade the VERSION variable in setup.py and make a new version release. I'm sure that if you wipe out the previous one and replace it without changing the version, you won't upset anyone, it has only been out for a few hours :)

mattjj commented 9 years ago

Can you rebase this onto the current master?

mattjj commented 9 years ago

Actually, wiping and replacing it without changing the version doesn't work anymore!

I'm somewhat confused here. When I run sdist it looks like the .cpp files are being included in pyhsmm, i.e. it prints stuff like

hard linking pyhsmm/internals/hmm_messages_interface.cpp -> pyhsmm-0.1.1/pyhsmm/internals

I also made some updates to setup.py so that it should default to building the .cpp sources included in the pypi distribution, and made a fix to the cythonize(./pyhsmm... line like you did.

So can you check out the current master and/or the current pypi distribution and see what's actually still broken? In the meantime, I'll try to read and understand your comments better.

mattjj commented 9 years ago

Here's the new setup.py bit I'm talking about. I think the only thing left to ensure is that the .cpp files are in the pypi distribution.

mattjj commented 9 years ago

I downloaded the tarball from the pypi page and it definitely includes the .cpp files (but not the pyx files). I guess that's a default? In that case, maybe we only have to add the .pyx bit to Manifest.in?

yarden commented 9 years ago

It depends which cpp files you're referring to. The ones from Eigen are taken care of because I already had recursive-include deps * in MANIFEST.in -- so that includes everything in deps which includes all the Eigen3 files. Any other cpp files would not be included.

The *.pyx files would not get included unless you force them in through setup.py or MANIFEST.in, which is what my latest pull request fixes.

I'm not totally sure how to rebase - I might just rebuilt my changes into your current version, I'm not good with git.

mattjj commented 9 years ago

The other cpp files are being included though; you can see it by downloading the tarball from the pypi page yourself. I've also been testing directly with the tarball generated in dist by the sdist command and I'm pretty confident that adding *.pyx to Manifest.in is all that's necessary.

I'm going to push that change, update the pypi package, and close this PR for now. If I'm mistaken about something, i.e. if the pypi tarball doesn't include both the pyhsmm (non-Eigen) cpp files and the pyx files, then we can reopen it.

yarden commented 9 years ago

I am looking at the tarball - unless you uploaded a new one, I still think it's missing stuff. For example, there's no *.h files in internals:

$ ls pyhsmm-0.1.1/pyhsmm/internals/*.h
ls: pyhsmm-0.1.1/pyhsmm/internals/*.h: No such file or directory

but clearly they should be there:

$ ls pyhsmm/internals/*.h
pyhsmm/internals/hmm_messages.h  pyhsmm/internals/nptypes.h
pyhsmm/internals/hsmm_messages.h pyhsmm/internals/util.h

my changes to MANIFEST.in fix that.

The way I tested mine is I ran:

python setup.py sdist

then untarred it on a different machine and tried to do:

cd pyhsmm-0.1; pip install .; python -m unittest discover pyhsmm

I believe that the tarball that's up now won't compile.

mattjj commented 9 years ago

It does include the cpp files automatically though, right? That tarball you downloaded was generated with a Manifest.in that only referenced the Eigen stuff.

You're right that the .h files are necessary to build from the pyx files (though I'm not sure if they're necessary to build from the cpp files, since they're never #included and the compile line doesn't reference them in any way).

It's too bad I had just pushed that. Incrementing the version number again I guess...

mattjj commented 9 years ago

Okay, in b65ab07 the Manifest.in file has these lines

recursive-include pyhsmm/deps *
recursive-include pyhsmm *.pyx
recursive-include pyhsmm *.h

That definitely includes the pyx files, the cpp files, and the h files in pyhsmm/internals in the generated tarball.

However I will learn from my mistakes and not push it to pypi immediately! I'm pretty sure this is right this time, but let me know if I'm missing anything...

mattjj commented 9 years ago

Ah okay, here's why the .cpp files were included:

If you don’t supply an explicit list of files (or instructions on how to generate one), the sdist command puts a minimal default set into the source distribution:

  • all Python source files implied by the py_modules and packages options
  • all C source files mentioned in the ext_modules or libraries options
  • [...]

So it got slurped up via the ext_modules. If I had included the header files in the sources of each Extension instance, they would have gotten included too.

yarden commented 9 years ago

Ah I see. Your cythonize did seem much simpler than what I've had in past. This is a setup.py file I wrote a long time ago that conditionally compiles Cython files, and adds a "clean" option to wipe out Cython files for developers: https://github.com/yarden/MISO/blob/newmiso/setup.py

I defined each Cython file as an Extension so I guess that's why I never had to include the *.h files explicitly.

PS this is the relevant bit from that setup.py, where I overload the sdist class and do other crimes against humanity:

##
## Handle creation of source distribution. Here we definitely
## need to use Cython. This creates the *.c files from *.pyx
## files.
##
cmdclass = {}
from distutils.command.sdist import sdist as _sdist
class sdist(_sdist):
    """
    Override sdist command to use cython
    """
    def run(self):
        try:
            from Cython.Build import cythonize
        except ImportError:
            raise Exception, "Cannot create source distribution without Cython."
        print "Cythonizing"
        extensions = cythonize(miso_extensions)
        _sdist.run(self)
cmdclass['sdist'] = sdist

##
## Handle Cython sources. Determine whether or not to use Cython
##
extensions = []
def no_cythonize(extensions, **_ignore):
    new_extensions = []
    for extension in extensions:
        # Make copy of extension so we do't modify
        # it destructively
        ext_copy = \
          Extension(extension.name,
                    extension.sources,
                    include_dirs=extension.include_dirs,
                    library_dirs=extension.library_dirs)
        sources = []
        for sfile in ext_copy.sources:
            path, ext = os.path.splitext(sfile)
            new_sfile = sfile
            if ext in ('.pyx', '.py'):
                if extension.language == 'c++':
                    ext = '.cpp'
                else:
                    ext = '.c'
                new_sfile = path + ext
            sources.append(new_sfile)
        ext_copy.sources[:] = sources
        new_extensions.append(ext_copy)
    return new_extensions

# Whether or not to use Cython
USE_CYTHON = False
mattjj commented 9 years ago

Aaaaaand here's the docstring for the add_defaults method for the sdist class in distutils/command/sdist.py:

    def add_defaults(self):
        """Add all the default files to self.filelist:
          - README or README.txt
          - setup.py
          - test/test*.py
          - all pure Python modules mentioned in setup script
          - all files pointed by package_data (build_py)
          - all files defined in data_files.
          - all files defined as scripts.
          - all C sources listed as part of extensions or C libraries
            in the setup script (doesn't catch C headers!)
        Warns if (README or README.txt) or setup.py are missing; everything
        else is optional.
        """
mattjj commented 9 years ago

Thanks for the reference on your cython setup.py file.

I think it is more typical to explicitly build the extension modules as Extension instances in setup.py. But the cythonize (which is fairly recent) has that great globbing functionality, plus it reads fields like extra_compile_args from the top of the pyx file ("distutils processing" or something). I preferred that style because it meant the information needed to compile a pyx file would stay with the pyx file instead of being buried in a setup.py. It also makes including projects in other projects was easy, since a recursive cythonize command can find all the pyx files and have everything it needs to build them.

mattjj commented 9 years ago

At this rate we'll be at version 9999 in no time!

I find this distutils stuff shockingly under-documented and frustrating, but maybe I just wasn't patient enough with it to learn its ways.

mattjj commented 9 years ago

Thanks again for your help with this! You did all the work here and it would have taken me a while to figure this stuff out (if I didn't give up in frustration). This is really going to help people with installation and compilation (including future me).

yarden commented 9 years ago

My pleasure! pyhsmm is awesome and I think it'll help people (like me) build libraries that depend on it since it can be installed automatically. I completely agree the distutils stuff can be awful, especially when Cython's involved. But once there's a stable release on pypi it's really nice as a reference point and for making pipelines that are guaranteed to work for a specific version (without depending on git).

alexbw commented 9 years ago

Late to the party, but this is excellent!