nf-core / circdna

Pipeline for the identification of extrachromosomal circular DNA (ecDNA) from Circle-seq, WGS, and ATAC-seq data that were generated from cancer and other eukaryotic cells.
https://nf-co.re/circdna
MIT License
25 stars 14 forks source link

AmpliconArchitect should use AmpliconSuite-pipeline as mode of entry #52

Closed jluebeck closed 5 months ago

jluebeck commented 1 year ago

Description of feature

Thank you Daniel and the rest of the nf-circdna team for putting together this nextflow pipeline!

From my understanding of the nf-circdna workflow file, https://github.com/nf-core/circdna/blob/master/workflows/circdna.nf#L395, the following steps are taken

  1. Run CNVKit (mode with matched normal not currently supported?)
  2. Feed .cns file into collect_seeds.py, which is ported from the PrepareAA.py script.
  3. Run amplified_intervals.py on the output bed from step 2.
  4. Run AmpliconArchitect.py on the output bed from step 3.

However, one important function from PrepareAA is missing in this workflow. PrepareAA.py calls a function between steps 2 and 3 from above called cnv_prefilter (line 828 of PrepareAA). This applies an additional set of filters needed before using amplified_intervals.py. This isn't reflected in the current workflow and for best reproducibility of AmpliconSuite-pipeline on nextflow, it should be included either by adding it in a custom way, as other parts of PrepareAA.py have been added, or by just using the PrepareAA.py wrapper script to take the .cns file, convert and filter it (prefilter + amplified_intervals.py).

It may be simplest to instead of breaking up the PrepareAA.py code into pieces, place AmpliconSuite-pipeline (PrepareAA) in the workflow, give the CNVKit .cns file to PrepareAA.py - let it make the seeds and then you can give the AA_CNV_SEEDS.bed file to AmpliconArchitect as is currently done.

I am happy to provide additional guidance on this if needed.

Thanks again, Jens

DSchreyer commented 1 year ago

HI Jens, yes I agree this should be added. By including PrepareAA, also future updates can be easily incorporated. This should take much time,

You are correct, cnvkit currently only runs in tumour-only mode, but I will try to include normals as well in the next update. I am not sure how much work this will be, but I will keep this issue open until it is fixed.

Cheers, Daniel

jluebeck commented 1 year ago

Fantastic, that sounds great. Thank you Daniel! Please let me know if I can assist in any way.

Cheers, Jens

DSchreyer commented 1 year ago

Hi Jens, yes I might need assistance because I have trouble getting PrepareAA into the pipeline due to mosek, again. I believe using mosek9 is still not an option for either PrepareAA or AmpliconArchitect.

To run the pipeline with singularity or docker, a multi-package-container needs to be created containing all conda packages necessary to run PrepareAA. However, the only combination of package versions i was able to find which can use Mosek8 is based on python2. Python2 is now not supported anymore and the maintainers of the multi-package-container repo is concerned due to security issues coming from mosek and they are, so far, not willing to create the container.

For the previous versions a container exists, but the newer version requires intervaltree, so a new container is needed.

Questions:

Let me know if you have answers for these. If mosek9 is fine to use, I am happy to update PrepareAA, but leave AmpliconArchitect with mosek8. At least until new modules will be added and a new container is needed O.O

Cheers, Daniel

DSchreyer commented 1 year ago

To note: This is the current pull request containing the conversation with the multi-package-container maintainer: https://github.com/BioContainers/multi-package-containers/pull/2588

Many tries failed using python3 and mosek8. Python3 and mosek9 seems to work and it also works for PrepareAA. Is it possible to substitute mosek9 with mosek8 in PrepareAA? This might circumvent the security issue.

jluebeck commented 1 year ago

Hi Daniel,

Thanks for working on this update. Yes, let us definitely resolve this.

We now have a singularity container available here for AmpliconSuite-pipeline (PrepareAA). https://github.com/jluebeck/AmpliconSuite-pipeline#option-b-singularity--docker-images. This singularity container uses python3 for everything.

You can do everything in the pipeline with python3, so it is certainly possible (and probably recommended) to drop support for python2 and mosek8.

Happy to take a look at any error logs you are receiving to troubleshoot this update.

Thanks and please let me know what other information I can provide. Jens

jluebeck commented 1 year ago

I think I didn't understand the BioContainers dev's question about the source of the Mosek package being a security issue. Here is the Conda page for the Mosek package: https://anaconda.org/MOSEK/mosek

DSchreyer commented 1 year ago

Hi Jens, thanks for the quick reply! Currently the AA version is "1.3_r1" which I believe is the most updated on the deshpande github.

That's good news that mosek9 is already implemented in PrepareAA and will be implemented in AA soon. This might fix some issues, because it is currently on its own conda channel "mosek" instead of conda-forge or bioconda, which is usually used in the multi-package containers.

We will see if we can get the multi-package container now through with mosek9 and python3.

I keep you updated on this in this issue!

Best, Daniel

DSchreyer commented 5 months ago

I will close this issue here as it is in progress and will be included in v.1.1.0