kwikteam / klustakwik2

Fast software for high-dimensional cluster analysis using the masked EM algorithm for Gaussians mixtures
BSD 3-Clause "New" or "Revised" License
27 stars 13 forks source link

klustakwik2

.. image:: https://travis-ci.org/kwikteam/klustakwik2.svg?branch=master :target: https://travis-ci.org/kwikteam/klustakwik2

NOTE: please follow the instructions for the klusta package https://github.com/kwikteam/klusta : the instructions below should not be used.

Installation instructions

Install Python using the Anaconda distribution <http://continuum.io/downloads>_. You will need to install the packages numpy, scipy, cython and nose. For Windows, Python 2.7 might be a better option than 3.x.

On all platforms, KlustaKwik can be installed using pip install klustakwik2 (from source) or conda install -c kwikteam klustakwik2 (precompiled binary). The default installation options are as follows:

To override these options, install from source (see below).

Multithreading with Anaconda on Windows


The Anaconda distribution installs its own compiler that doesn't support OpenMP for multithreading and uses this by
default instead of the MSVC compiler, which does support OpenMP. To disable the Anaconda compiler, simply run the
following command before installing KlustaKwik.

    conda remove libpython

Installing from source

Download the source, either from one of the source distributions on PyPI <https://pypi.python.org/pypi/klustakwik2> or get the latest version from GitHub <https://github.com/kwikteam/klustakwik2>. Run one of the following commands from a command prompt in this directory. For default options:

python setup.py install

To force multithreading to be on:

python setup.py install --with-openmp

To force multithreading to be off:

python setup.py install --no-openmp

Windows


Install a precompiled Windows binary with Anaconda in an anaconda package:
    conda install -c kwikteam klustakwik2

If you wish to compile from source, the instructions are a bit more complicated:

Using Python 2.7, you will need a copy of MS Visual Studio Express 2008 for Python, available for free
download `here <http://www.microsoft.com/en-us/download/details.aspx?id=44266>`_. Python 3.x might require a different
version of Visual Studio, we haven't tested this.

Download the source as above. Open a command prompt in the
directory where you downloaded and extracted the files. If you installed Python for all users, then you will need
admin rights on this command prompt. To get this in Windows, press the Windows key, type "cmd", right click on
"cmd.exe" and click "Run as administrator".

Now run the commands as in the section on installing from source above.

Mac

It is possible to install a version of gcc that allows for multithreading. TODO: details.

Usage

To cluster a pair of files name.fet.n, name.fmask.n run the command:

kk2_legacy name n

This will generate a name.klg.n and name.clu.n file. Note that the first time you run it, it will generate a whole lot of warnings and compiler output: ignore this, it is normal.

You can specify additional options to this script. The major ones are explained below: