sriramlab / GENIE

Gene Environment Interaction Estimator
MIT License
2 stars 0 forks source link

Plan for Python bindings #1

Open fcooper8472 opened 1 month ago

fcooper8472 commented 1 month ago

I propose adding minimal infrastructure to allow installation and use of this tool via Python.

Assuming that it ends up being called "rhe" on PyPi (which has some restrictions on naming), I would envisage an end user being able to:

pip install rhe

and then, from a Python script:

from rhe import genie
from rhe import genie_mem
from rhe import genie_multi_pheno

genie(<args>)

This functionality would, of course, be in addition to the existing executables that a user can still compile and run exactly as described currently in the README.

I would propose using scikit-build-core and pybind11 to achieve this.

In addition, I would propose adding a GitHub Actions workflow to automate the building of Python wheels for all current Python versions, and, ideally, on Linux, macOS and Windows (although there may be platform-specific code that I'm currently unaware of that may prevent this).

If this sounds broadly acceptable as a path forward, let me know, and I will begin working on it.

sriramsr commented 1 month ago

That sounds like a good plan. Please go ahead.

Sriram

On Thu, May 30, 2024 at 9:02 AM Fergus Cooper @.***> wrote:

I propose adding minimal infrastructure to allow installation and use of this tool via Python.

Assuming that it ends up being called "rhe" on PyPi (which has some restrictions on naming), I would envisage an end user being able to:

pip install rhe

and then, from a Python script:

from rhe import geniefrom rhe import genie_memfrom rhe import genie_multi_pheno

genie()

This functionality would, of course, be in addition to the existing executables that a user can still compile and run exactly as described currently in the README.

I would propose using scikit-build-core https://scikit-build-core.readthedocs.io/en/latest/ and pybind11 https://pybind11.readthedocs.io/en/latest/ to achieve this.

In addition, I would propose adding a GitHub Actions workflow to automate the building of Python wheels for all current Python versions, and, ideally, on Linux, macOS and Windows (although there may be platform-specific code that I'm currently unaware of that may prevent this).

If this sounds broadly acceptable as a path forward, let me know, and I will begin working on it.

— Reply to this email directly, view it on GitHub https://github.com/sriramlab/GENIE/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AENU7PYCFDBSXWOSDXMZBFLZE5ERZAVCNFSM6AAAAABIRGVIICVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZDMMBTGU2TQOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

fcooper8472 commented 1 month ago

@sriramsr a couple of points that have come up so far:

  1. Should the SSE support be used where possible? It seems to work with -DSSE_SUPPORT=1, but by default it is off, and there's no info in the README about turning it on. Would it be recommended to turn it on?
  2. Currently the code can't (easily) be compiled on Windows because of some posix specific headers.
  3. Currently the code can't (easily) be compiled on arm64 architecture because of the emmintrin header, so currently it won't compile (and therefore we can't build wheels for) arm64 macOS machines.
fcooper8472 commented 1 month ago

Also, in std.h there is a function:

unsigned long long rdtsc(){
    unsigned int lo,hi;
    __asm__ __volatile__ ("rdtsc": "=a" (lo), "=d" (hi));
    return ((unsigned long long)hi << 32) | lo;
}

which doesn't appear to be used anywhere. Do you know whether it's OK to get rid of this? Because it uses inline amd64 assembly, it would be ideal to get rid of it when working towards arm64 support.

fcooper8472 commented 1 month ago

In fact: the current examples do not run when SSE is enabled. See here for an example that fails:

https://github.com/sriramlab/GENIE/actions/runs/9387244040/job/25849680141

GENIE: /home/runner/work/GENIE/GENIE/include/mailman.h:85: void mailman::fastmultiply_sse(int, int, int, std::vector<int> &, Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor> &, double *, double *, double **): Assertion `k%10 == 0 && "k should be a multiple of 10"' failed.
fcooper8472 commented 1 month ago
  1. Currently the code can't (easily) be compiled on arm64 architecture because of the emmintrin header, so currently it won't compile (and therefore we can't build wheels for) arm64 macOS machines.

Removing the inline assembly and conditionally only including the emmintrin if the architecture is appropriate allows GENIE to compile and run on arm64 macOS.

sriramsr commented 1 month ago

Yes let's remove this function.

Sriram

On Wed, Jun 5, 2024 at 8:18 AM Fergus Cooper @.***> wrote:

Also, in std.h there is a function:

unsigned long long rdtsc(){ unsigned int lo,hi; asm volatile ("rdtsc": "=a" (lo), "=d" (hi)); return ((unsigned long long)hi << 32) | lo; }

which doesn't appear to be used anywhere. Do you know whether it's OK to get rid of this? Because it uses inline amd64 assembly, it would be ideal to get rid of it when working towards arm64 support.

— Reply to this email directly, view it on GitHub https://github.com/sriramlab/GENIE/issues/1#issuecomment-2150336269, or unsubscribe https://github.com/notifications/unsubscribe-auth/AENU7PY6NTE3LWVBTGTIMTLZF4T4XAVCNFSM6AAAAABIRGVIICVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJQGMZTMMRWHE . You are receiving this because you were mentioned.Message ID: @.***>

fcooper8472 commented 3 weeks ago

@sriramsr this work is mostly done now: I have built Python wheels for Linux and both arm64 and amd64 macOS. Just a few questions:

  1. The project did not appear to depend on GSL. Is this expected?
  2. Boost only appeared to be used to generate random numbers from N(0,1). I replaced this with using the C++ standard library random number generator from C++11, which works in an almost identical way to boost. Is this OK, or was there a specific reason to use Boost? One possible reason is that using Boost will guarantee the same sequence of random numbers for a given seed across all distributions, whereas in the standard library, different vendors are permitted to choose their own algorithm. My strong preference would be to not use boost unless this is a requirement.
  3. When the time comes, are you happy for me to upload to PyPI using my personal PyPI credentials? This would list me as the "maintainer" on PyPI (e.g. as here: https://pypi.org/project/arg-needle-lib/), but the homepage links would point to your repository, of course. The alternative would be for you to create a PyPI token and add it to the GENIE repository as a GitHub Actions Secret.
sriramsr commented 3 weeks ago
  1. I think this was older code. No longer needed.
  2. Agreed
  3. That sounds good to me.

Sriram

On Wed, Jun 12, 2024 at 1:17 PM Fergus Cooper @.***> wrote:

@sriramsr https://github.com/sriramsr this work is mostly done now: I have built Python wheels for Linux and both arm64 and amd64 macOS. Just a few questions:

  1. The project did not appear to depend on GSL. Is this expected?
  2. Boost only appeared to be used to generate random numbers from N(0,1). I replaced this with using the C++ standard library random number generator from C++11, which works in an almost identical way to boost. Is this OK, or was there a specific reason to use Boost? One possible reason is that using Boost will guarantee the same sequence of random numbers for a given seed across all distributions, whereas in the standard library, different vendors are permitted to choose their own algorithm. My strong preference would be to not use boost unless this is a requirement.
  3. When the time comes, are you happy for me to upload to PyPI using my personal PyPI credentials? This would list me as the "maintainer" on PyPI (e.g. as here: https://pypi.org/project/arg-needle-lib/), but the homepage links would point to your repository, of course. The alternative would be for you to create a PyPI token and add it to the GENIE repository as a GitHub Actions Secret.

— Reply to this email directly, view it on GitHub https://github.com/sriramlab/GENIE/issues/1#issuecomment-2163825954, or unsubscribe https://github.com/notifications/unsubscribe-auth/AENU7P7L62MPYVEDWUKDYDLZHCUFLAVCNFSM6AAAAABIRGVIICVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRTHAZDKOJVGQ . You are receiving this because you were mentioned.Message ID: @.***>

fcooper8472 commented 3 weeks ago

A first draft of this is now complete. See here for a summary of changes: https://github.com/sriramlab/GENIE/pull/2#issuecomment-2167741280

Once you're happy with the changes, I'll upload the wheels to PyPI.