nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
391 stars 73 forks source link

Request bioconda release of newest version #422

Closed incoherentian closed 1 year ago

incoherentian commented 1 year ago

Can I pester the maintainer on bioconda to release the newest Medaka there? (Is that pending resolution of the current handful of issue requests?)

Is your feature request related to a problem? Please describe. DragonFlye is an awesome and easy one-stop solution to include Medaka and many other tools, but will only incorporate conda releases and currently relies on a quite dated Medaka. I'm hoping an updated bioconda release of Medaka enables usage of the newest LR polishing as part of the DragonFlye (and hopefully newer Bactopia) pipelines. I tried quickly pinning v1.7.2 but couldn't get it to work, and would like the newest version pinned before troubling the DragonFlye author for further help.

Describe the solution you'd like Anything to improve chances of no longer needing hybrid bacterial assemblies :'D

More specifically I'd like this link to take everyone to a page that says "bioconda / packages / medaka 1.7.3" or newer up top.

Additional context Related to: https://github.com/rpetit3/dragonflye/issues/17 ONT bioinformatics people might be interested to see that even Wick/Holt's freshest paper seems to draw conclusions based on chemistry 14 ligation kit accuracy, seemingly without having the opportunity to try incorporating the brand new chemistry 14 transposase kit. Would be amazing to see if we can get those small plasmids and improved R10.4.1 post-assembly accuracy using chemistry 14 transposase indexing and no SR polishing. Medaka would have to be a major part of accomplishing that. The ability to incorporate the newest medaka into dragonFlye by default would expedite the ability of non-bioinformaticians like me to have a go :)

cjw85 commented 1 year ago

There is a an open PR here: https://github.com/bioconda/bioconda-recipes/pull/39340, you are welcome to make the necessary changes and I can review.

Note, we are moving away from using bioconda due to the lack of control over the package update and release process: principally anyone being able to come along and change the package. See https://github.com/bioconda/bioconda-actions/issues/19#issuecomment-1279827142

gwl2 commented 1 year ago

Conda/bioconda is more or less the defacto standard for a lot of biologists doing bioinformatics. So although I understand ONT's reasons for switching, I am not really convinced that it's a good thing for ONT customers.

Could you at least use a consistent method for installing ONT tools? duplex tools -> venv medaka -> virtualenv

An other thing. Conda manages dependencies very well. Using your suggested virtualenv method, installation crashed a couple of times due to dependencies(libbz2-dev etc.), which are just listed under build from source.

cjw85 commented 1 year ago

I stated that we are moving away from bioconda, not conda. Within the current bioconda build and review system, there are no guarantees that the software is that which Oxford Nanopore Technologies provides. Whilst we embrace the open source ethos, we must also make efforts to ensure that our software reamins supportable.

I'm not sure what you mean by your second point; the most noteworthy parts of virtualenv were incorporated into the Python standard library with Python 3.3 as the venv module. They are to most intents and purposes functionally identical.

The binary pacakages (wheels) available on PyPI do not depend on external libraries being present. The fact that the installation was looking for the bzip2 development package suggests to me that you have triggered installation from the source distribution. This would lead to code compilation, hence needing the development package. I would have to know more about your system to understand why this might have occurred; ordinarily it should not since we have a wide selection of binary packages available.

incoherentian commented 1 year ago

I stated that we are moving away from bioconda, not conda.

Ah, I didn't realize different conda channels had different rules about things like e.g.

The problem you have currently is that there's a handful of users, possibly well intentioned but also I suspect in some part just trying to gain kudos, approving PRs for software for which they have absolutely no knowledge. This undermines the review process and you may as well simply have the bot auto merging the PRs it's making.

If not moving away from conda, are the medaka team using/planning to use a different channel? Something capable of being pinned during conda or mamba create?

gwl2 commented 1 year ago

The binary pacakages (wheels) available on PyPI do not depend on external libraries being present. The fact that the installation was looking for the bzip2 development package suggests to me that you have triggered installation from the source distribution. This would lead to code compilation, hence needing the development package. I would have to know more about your system to understand why this might have occurred; ordinarily it should not since we have a wide selection of binary packages available.

I did exactly what was described on the medaka github page:

virtualenv medaka --python=python3 --prompt "(medaka) " . medaka/bin/activate pip install medaka

Lots of errors, some as described above. I never got medaka running. Also deactivating conda wouldn't help, pip installation always crashed, so I gave up at a certain point.
I ended up creating a conda environment with the prerequisites for medaka and doing a pip install there, which worked somehow. Medaka is always complaining about a GPU not being present although I have a 3080 installed. To me it seems like the GPU version was installed, but the program can't find the GPU. From a consumer perspective I have to say, that the conda installation was much easier and more stable than the pip procedure stated on github. I'm running ubuntu 20.04, if that helps.

cjw85 commented 1 year ago

@gwl2

As stated, I would need to understand more about your system to know why the wheels on PyPI were not being used during installation. On a vanilla Ubuntu 20.04 system (the official docker image from dockerhub) the following is sufficient:

apt update
apt install python3-pip python3.8-venv
python3 -m venv venv --prompt test
. venv/bin/activate
python -m pip install -U pip
pip install medaka

Note the first and second lines here shouldn't be required typically on a system, we're just bootstrapping the Python install in the docker container. The 5th line here is required to update pip; this step is crucial, without it pip will try to fetch the source distribution of pysam rather than a precompiled wheel. (I think its missing from the README as it used to not be necessary, I will add this step explicitly forthwith).

If I may extrapolate from this, I would guess the issue you have is the pip available on your system is similarly old to that which ships with Ubuntu 20.04. This is not therefore an issue with medaka (or pysam) packaging. It's akin to using a decrepit version of conda or mamba.

gwl2 commented 1 year ago

Admittedly my mistake, i lost track of the pip installs in different environments and the pip of the OS was indeed not updated. Thanks a lot!

incoherentian commented 1 year ago

I don't know how to identify who facilitated it, but last week's update (1.8.0) is already on Bioconda. Given that, I hope no one minds me closing this.

Thanks to everyone who helped, and whomever wrangled medaka onto bioconda in particular!