replikation / poreCov

SARS-CoV-2 workflow for nanopore sequence data
https://case-group.github.io/
GNU General Public License v3.0
39 stars 16 forks source link

Update Medaka to support R10.4.1 models #247

Closed hoelzer closed 1 year ago

hoelzer commented 1 year ago

Currently we use the container nanozoo-artic-1.3.0-dev--2c5b6a9 which has Medaka v1.5.0 installed. Unfortunately, the new models for R10.4.1 flow cells that people start to use are not part of that. E.g.

r1041_e82_400bps_sup_g615

Can we easily update medaka as part of the Artic workflow? Maybe it's also enough to download the most recent models via re-building the container (I wrote a script for that which is already part of the containers repository). However, last time I tried to update Medaka as part of the Artic workflow I failed. Maybe you find a way to update the container @replikation ?

replikation commented 1 year ago

new container for medaka is already up. but we currently testing how new flowcell and new kit perform with artic

hoelzer commented 1 year ago

Ah nice and good to know. Is there a branch already that has the new container? We have fresh R10.4.1 runs since end of last week and would like to calculate them start of next week. And we can also run with an old model bc I would like to see the diff and impact of the models

replikation commented 1 year ago

@DataSpott ?

hoelzer commented 1 year ago

Yeah @DataSpott any news : ) Just in case there is already a new container in some branch we could also test it here w/ R10.4.1 data that was recently produced.

DataSpott commented 1 year ago

Sry, but I'm still working with the old R9-Flowcells and therefore old Medaka model. Did not yet dive into the next generation;)

hoelzer commented 1 year ago

Ok, I think people are switching already to R10.4.1 also bc R9 will be discontinued. What we would need to do is update medaka within the ARTIC container first (https://github.com/nanozoo/bx_artic) @replikation - maybe it's simple, possibly crashing ;)

And if we have updated medaka in that container, it should be relatively simple to test it w/ some runs and different models. What do you think?

hoelzer commented 1 year ago

And maybe we should also switch to a stable release of the ARTIC pipeline https://github.com/artic-network/fieldbioinformatics/releases/tag/v1.2.3 instead of the 1.3.0-dev branch? But I remember that you had a reason to use the dev branch...

hoelzer commented 1 year ago

Ahh, and in v1.2.3 there is defined in the conda env:

  - medaka >=1.6.1

so this might already solve the problem.

hoelzer commented 1 year ago

Just to keep you posted, I generated a new container for the ARTIC pipeline using v1.2.3 of their pipeline which installs medaka v1.6.1. Installing an even newer medaka version (1.7.2) does not work bc it screams for tensorflow 2.7.x which I was not able to solve.

But: medaka 1.6.1 at least has new r1041 models.

docker pull nanozoo/artic:v1.2.3--5d4390f

I would do now some further testing, adding the new container to a branch in poreCov and then do see if it runs through... because there must have been also a reason you used the 1.3.0-dev branch of the ARTIC pipeline... but maybe that's now also solved in their release v1.2.3

hoelzer commented 1 year ago

Maybe it's also possible to have a v1.3.0-dev container w/ medaka 1.6.1 - I will test that as well.

hoelzer commented 1 year ago

Okay, I can run the pipeline w/ nanozoo/artic:v1.2.3--5d4390f but then the medaka step fails bc/ the v1.2.3 does not have the --min_depth parameter we use here:

https://github.com/replikation/poreCov/blob/master/workflows/process/artic.nf#L21

The 1.3.0-dev branch of ARTIC has that.

Now I will try: still using 1.3.0-dev but installing the environment from v1.2.3 w/ medaka v1.6.1

hoelzer commented 1 year ago

Christian also hinted at a missing /opt/conda/bin in the containers' PATH. I will also try using as a template the container I once successfully build w/ ARTIC v1.3.0-dev and medaka 1.7.2

Template is nanozoo/artic:1.3.0-dev--9bca1ff

hoelzer commented 1 year ago

Okay, I can run the pipeline w/ nanozoo/artic:v1.2.3--5d4390f but then the medaka step fails bc/ the v1.2.3 does not have the --min_depth parameter we use here:

https://github.com/replikation/poreCov/blob/master/workflows/process/artic.nf#L21

The 1.3.0-dev branch of ARTIC has that.

Now I will try: still using 1.3.0-dev but installing the environment from v1.2.3 w/ medaka v1.6.1

this failed in the RUN cd fieldbioinformatics && python setup.py install step

hoelzer commented 1 year ago

Christian also hinted at a missing /opt/conda/bin in the containers' PATH. I will also try using as a template the container I once successfully build w/ ARTIC v1.3.0-dev and medaka 1.7.2

Template is nanozoo/artic:1.3.0-dev--9bca1ff

Alright, this was pain in the butt. But now I have

docker pull nanozoo/artic:1.3.0-dev--a15e2ee

which has ARTIC pipeline 1.3.0-dev (bc/ we use the --min_depth param...) and medaka 1.7.2 w/ all currently available models.

I think it looks fine, will try the whole pipeline tomorrow.

❯ docker run --rm nanozoo/artic:1.3.0-dev--a15e2ee artic minion --help | grep min_depth
                    [--max-haplotypes max_haplotypes] [--min-depth min_depth]
  --min-depth min_depth
❯ docker run --rm nanozoo/artic:1.3.0-dev--a15e2ee medaka --version
medaka 1.7.2