virajbdeshpande / AmpliconArchitect

AmpliconArchitect (AA) is a tool to identify one or more connected genomic regions which have simultaneous copy number amplification and elucidates the architecture of the amplicon. In the current version, AA takes as input next generation sequencing reads (paired-end Illumina reads) mapped to the hg19/GRCh37 reference sequence and one or more regions of interest. Please "watch" this repository for improvements in runtime, accuracy and annotations for GRCh38 human reference genome coming up soon.
Other
135 stars 43 forks source link

AmpliconArchitect - Licensing & Bioconda #120

Open DSchreyer opened 2 years ago

DSchreyer commented 2 years ago

Dear AmpliconArchitect Team, under which license is AmpliconArchitect available? I would like to put your software into bioconda to make it available for a broader range of users. However, I would like to know what you're policy about this is and if you mind that I put it into bioconda.

I think it will be a great way to distribute AmpliconArchitect and your related software packages, such as PrepareAA or AmpliconClassifier.

Let me know what you think.

Best, Daniel

erling-d-andersen commented 2 years ago

Mosek makes a pip package available which an off-line discussion reveals could in theory solve the issue. However, it is not that useful because there is no pip package for version 8. Version 8 is legacy version going out support soon.

The true solution is

https://github.com/virajbdeshpande/AmpliconArchitect/issues/103

because Mosek v9 is available as pip package.

So upgrading AmpliconArchitect to support the latest Mosek version is the best solution most likely.

dlaehnemann commented 1 year ago

I would like to second @DSchreyer here and ask for a clear license to be specified for AmpliconArchitect. Without it, automated distribution via bioconda cannot be done.

Another point to consider in this context would be to try and replace Mosek with a tool under an open source license, that can then also be included via conda. With a tool that requires a specific license, this cannot be done either. So even with a clear license for AmpliconArchitect, one would still have to manually obtain a Mosek license.

dlaehnemann commented 1 year ago

BTW, there's a great website for choosing a license, that also gives good reasons why you should choose a license (and a main one is for others to be able to safely use and build on your code).

jluebeck commented 1 year ago

@dlaehnemann, if you had not already seen, please see the bioconda install for AmpliconSuite-pipeline, which includes AmpliconArchitect.

https://github.com/AmpliconSuite/AmpliconSuite-pipeline#option-b-install-with-conda-or-mamba

conda install -c bioconda -c mosek ampliconsuite mosek
wget https://raw.githubusercontent.com/AmpliconSuite/AmpliconSuite-pipeline/bioconda/install.sh
source install.sh --finalize_only  # this will confirm the data repo path and mosek license directory

Then users obtain and place their Mosek license into $HOME/mosek/, download whatever reference genome they need and can run the tool.

Mosek is not an open-source utility and while the package is distributed with Bioconda it then requires the Mosek license to operate (we do not control their licensing policies), however AmpliconSuite-pipeline is released under the BSD 2-clause license.

I agree that moving to a fully open source solver with no requirement to obtain a license manually would be preferred. For containerized deployment of AmpliconSuite, the license is mounted into the container from the host system. In my experience, it is mostly a one-time-only annoyance that happens during installation of the tool (single machine or cluster)

Thanks, Jens

dlaehnemann commented 1 year ago

Thanks for the quick and thorough answer. I have some follow-up questions:

From what I can see, the permissive BSD-2 license you use for AmpliconSuite explicitly excludes sub-modules like AmpliconArchitect: https://github.com/AmpliconSuite/AmpliconSuite-pipeline/blob/7fd4b5bc9914aa43c02e3083987e65dd1ba688cc/LICENSE#L1C136-L1C136

For AmpliconArchitect, I can only find something like a custom University of California license at the top of individual code files: https://github.com/virajbdeshpande/AmpliconArchitect/blob/40da8520a953810ad43e5a6fdf4aba7449d7f5e0/src/AmpliconArchitect.py#L4-L18

So a quick useful change would be to also include that license as a LICENSE file in the main folder of the repository. That way it is much more visible, I previously just missed that there even was a license. Also, then we could directly link to the LICENSE file from the bioconda meta.yaml's license: field. If you have an indication to what license_family: this license belongs, that would tick another box for distributing AmpliconArchitect itself via bioconda.

Regarding the installation process of AmpliconSuite, two suggestions:

  1. You could consider using a post-link.sh to automate the install.sh --finalize-only step of the installation for users: https://bioconda.github.io/contributor/guidelines.html#link-and-unlink-scripts-pre-and-post-install-hooks
  2. This post-link.sh could also print a message to the user, asking them to install mosek into the same environment, so you could then leave it out of the original command.

Or if mosek needs to be installed for install.sh --finalize-only to work properly, you could print a message asking the user to install mosek and then run install.sh --finalize-only. Otherwise, users that haven't noticed these details in the AmpliconSuite README might end up wondering why the package doesn't work for them...

Feel free to ping me on bioconda for reviews of these suggestions.

jluebeck commented 1 year ago

Thanks David*, these pointers are great. I have added a dedicated license file with the UCSD license pending in the open PR for @virajbdeshpande to merge so that it is easy to see the license even without going into the .py files.

Generally, we do not recommend automated distribution AmpliconArchictect on its own, since the preparation of inputs and downstream interpretation of outputs is challenging. The updated README in the pending PR will describe the best practices in detail and show users how to get AmpliconSuite-pipeline in place. AmpliconSuite-pipeline provides standardized solutions to upstream and downstream issues, incorporating our suggested best practices and giving easily interpretable outputs. When removed from AmpliconSuite-pipeline, we find that users make many, many mistakes in using AA. Experienced users of AmpliconSuite-pipeline can bypass all those steps as they see fit. AmpliconArchitect alone, on the other hand, provides none of these instructions which are key to improving reproducibility between users/labs.

I will take a look at updating the meta.yaml for AmpliconSuite-pipeline, or at least making it clear which modules use which license. Your suggestion about the post-link.sh file to replace install.sh --finalize-only is also great and I will see if this is workable for my next update of the conda recipe. I will reach out via bioconda for additional questions or review!

Cheers, Jens

dlaehnemann commented 1 year ago

Many thanks for the follow-up, and I can totally see what you are aiming at with the AmpliconSuite pipeline. Have you ever thought about implementing AmpliconSuite with a workflow manager, as this would automate software installation and make it more modular, and thus more scalable and portable across different compute environments? Also this might make it easier to read and maintain in the mid-term. As a heavy snakemake user, (and as you are a python coder) that's what I would recommend: https://snakemake.readthedocs.io/en/latest/

Also feel free to ping me with questions about snakemake. And if you're instead interested in Nextflow as a workflow manager, I think that is where the original poster @DSchreyer is coming from with his original bioconda request.

DSchreyer commented 1 year ago

Hi @dlaehnemann, yes I was initially interested in how to implement it into Nextflow and we did write an nf-core pipeline incorporating the AmpliconSuite modules. This is the link to it: https://github.com/nf-core/circdna . It is not perfect, but it does work quite nicely.