lazear / sage

Proteomics search & quantification so fast that it feels like magic
https://sage-docs.vercel.app
MIT License
201 stars 38 forks source link

Python bindings for sage-core library #100

Open theGreatHerrLebert opened 7 months ago

theGreatHerrLebert commented 7 months ago

Hi Michel,

Thank you for developing the sage-core library; it's a fantastic resource. I'm David Teschner, a Ph.D. candidate at the University of Mainz, working on data processing for the Bruker timsTOF platform. We're interested in extending sage-core's capabilities to Python, a favored language among researchers, to enhance data analysis and model integration.

I've developed a wrapper using pyO3 and maturin to connect sage-core with Python, making it more accessible for scientific coding and prototyping. If this aligns with your vision, we'd like to contribute this to your crates collection, ensuring maintainability and community access.

Appreciate your consideration.

Best, David

lazear commented 7 months ago

Hi David,

This absolutely aligns with my vision - I developed a very basic pyO3 wrapper at some point, but it has fallen into disrepair and is very out of sync with the current version of Sage.

Two notes:

Thank you for the contribution! Michael

theGreatHerrLebert commented 7 months ago

Hi Michael,

Great to hear back from you. I also thought about the integration process and the future upkeep of Sage’s Python bindings:

Rust Glue Code Integration:

Merging the Rust 'glue' code from pysage-connector directly into the Sage crates seems ideal for ease of maintenance and to ensure alignment with the core library. However, I recognize this adds a maintenance layer to your workload, as any changes in Sage would necessitate updates to the bindings. If we deem this manageable, I'm ready to submit a pull request from my SAGE fork for review. Alternatively, we could keep pysage-connector separate and fetch the Sage core from the Rust crates. This modular approach is likely the most flexible.

Release Automation:

Automating Python wheel builds and publications through GitHub Actions would be nice for releases. We'd need a linked PyPI account and access token for distribution. I'm new to this part as well (shamefully doing my publications manually right now) but am willing to delve in and set up this process.

Python Package Maintenance:

Contributing the Python-centric pysage library to a new SAGE organization would be nice. This would not only help manage the project components as they expand but also ensure clarity in the project’s structure.

I suggest we start with the Rust glue code integration into Sage if you’re comfortable with the additional overhead, or moving this part into a separate project in a new SAGE organization otherwise. Concurrently, we can establish the PyPI distribution pipeline.

Eager to move forward with this :) !

Best regards, David

lazear commented 7 months ago

What about putting pysage-connector & the pysage bindings in the same repo? I think that might make sense from a CI/automated release/maintenance perspective - and it also allows for pysage-connector to pin a different version of Sage (only needs to be updated on a full new release of Sage, and not every commit/PR). Changes to the bindings also only impact 1 repository then. I don't have any experience with the PyPi release automation, but I know there are other software packages using maturin/pyO3 and doing this, so it should be possible to start by copying what their doing.

On the organization front, I am going to hold off for a minute on moving this repository at least... The Sage paper contains a link to this repository, and I don't want to move it and leave a 404. I can still make one for auxiliary repositories (documentation, pysage, etc) if you're interested.

radusuciu commented 7 months ago

I don't have any experience with the PyPi release automation, but I know there are other software packages using maturin/pyO3 and doing this, so it should be possible to start by copying what their doing.

I have some experience with this (maturin + pypi release automation) if you're looking for a contribution. It's straightforward but a little annoying to debug.

theGreatHerrLebert commented 7 months ago

Hey guys,

I agree with you about the organization of things, here are poins to be adressed:

Combining Repositories

I am with you on merging pysage-connector and the bindings into one repo. It simplifies CI, releases, and overall maintenance. Plus, it’s more efficient with everything under one roof for python.

PyPI Release Automation

I’m also for adopting strategies from projects using maturin/pyO3 for our PyPI release automation. I’ll explore this and figure out how we can integrate it into a github workflow. Would of course also appreciate help from @radusuciu.

Name Change and PyPI Upload

Just a heads-up: 'pysage' was already claimed on PyPI, so I've switched to 'sagepy' and 'sagepy-connector'. They’re already up on PyPI, and you can install the current draft easily with 'pip install sagepy', which handles all the necessary dependencies. There will also be a new repository holding both sagepy and sagepy-connector, pysage will be deleted.

Sage Organization & Repo Moving

Housing everything in a Sage org makes sense I think :). I can move ownership of the sagepy repo once the org its there. It’ll keep the projects cohesive. And moving the sage repo might also not be a big problem – GitHub redirects will prevent any 404 issues. Still, I totally get why waiting to move the repo (if ever), considering the paper reference.

Pull Request to Sage-Core & Crates.io Release

I’m planning to submit a pull request with my changes to the sage-core. For the python bindings to be independent of my fork, these updates need to be in the main repository. Also, I think releasing the sage core library on crates.io could be a good idea. It would allow us to link against different versions more efficiently than pulling directly from a GitHub repo.

Looking forward to your feedback!

Best,

David

lazear commented 7 months ago

Sounds good - I will get an org set up. In the mean time, go ahead and submit a PR whenever you are ready!

theGreatHerrLebert commented 7 months ago

Hey Michel,

I am almost finished with the initial version of the exposed library. Currently, it only supports raw scoring, which essentially means identification, without any FDR. For know, I will include how to perform FDR via mokkapot. Also, the quantification logic appears to be deeply integrated into the overall sage pipeline and also be available for LC-MS/MS only. Exposing it would be ideal, but this would require significantly more work and some discussion. I will add some example code to the sagepy repository as soon as possible and do my best to include some testing as well. 😉

As always, let me know what you think.

Best,

David