mycobactopia-org / MTBseq-nf

MTBSeq made simple and easy using Nextflow and nf-core standard.
https://doi.org/10.5281/zenodo.5498063
MIT License
8 stars 1 forks source link

[DEV] Adding a gatk_jar downloader #58

Closed Mxrcon closed 1 year ago

Mxrcon commented 2 years ago

Hey :wave:, May I help on this issue? #57 I'm adding a utility module able to download the Gatk_jar using the wget command and them use tar to uncompress it. The output is caught by a glob pattern, Not sure about this addition, maybe we can discuss another way of parsing it! I've also added the alpine container usage on the docker profile, it has tar and wget. (i've double-checked on alpine package list.

This only add the module as utility, we can discuss if the module should be added on our workflows.

This Pull Request closes #57

Hope this small PR helps! Feel free to add your thoughts, I'm available to help with this task.

Kindly, Davi

abhi18av commented 2 years ago

Thanks for the initiative Davi! 💚 👍

I'm also working on addressing the installation issue via conda and perhaps we can merge this PR as soon as the problem with gatk-register (and possible upgrade) is resolved in #59 ?

Mxrcon commented 2 years ago

😃 sure! this problem is so odd, I'm interested on how they'll solve it.

Mxrcon commented 2 years ago

Update: I added wget and tar to conda env, as this will prevent some incompatibilities

abhi18av commented 2 years ago

Davi, let's put this PR on hold till the point our analysis for generated reports is completed and then we can merge and test this PR and make the v1.0.0 release 🤓

Mxrcon commented 2 years ago

@abhi18av Sure! I completely agreed.

Mxrcon commented 2 years ago

Davi thanks for this PR and the initiative on fixing these usability issues.

I have been testing the mtbseq-1.0.4 release via (i) conda red_circle (ii) docker red_circle and basically if gatk/gatk3 is not the issue then perl Basic::Statistics module is throwing dependency issues!

Even after using conda!!!

Therefore I'd like to be extra sure andd confirm whether you've tested this on a baseline linux machine via conda as well?

@abhi18av I tried to install mtbseq 1.0.4 via mamba, and tried to register using gatk3-register, then tried to run MTBseq, this was the output:

Can't locate Statistics/Basic.pm in @INC (you may need to install the Statistics::Basic module) (@INC contains: /home/davi_marcon/miniconda3/envs/mtbseq-104/share/mtbseq-1.0.4-2/lib /home/davi_marcon/miniconda3/envs/mtbseq-104/lib/perl5/5.32/site_perl /home/davi_marcon/miniconda3/envs/mtbseq-104/lib/perl5/site_perl /home/davi_marcon/miniconda3/envs/mtbseq-104/lib/perl5/5.32/vendor_perl /home/davi_marcon/miniconda3/envs/mtbseq-104/lib/perl5/vendor_perl /home/davi_marcon/miniconda3/envs/mtbseq-104/lib/perl5/5.32/core_perl /home/davi_marcon/miniconda3/envs/mtbseq-104/lib/perl5/core_perl .) at /home/davi_marcon/miniconda3/envs/mtbseq-104/share/mtbseq-1.0.4-2/lib/TBtools.pm line 9.
BEGIN failed--compilation aborted at /home/davi_marcon/miniconda3/envs/mtbseq-104/share/mtbseq-1.0.4-2/lib/TBtools.pm line 9.
Compilation failed in require at /home/davi_marcon/miniconda3/envs/mtbseq-104/share/mtbseq-1.0.4-2/lib/TBbwa.pm line 8.
BEGIN failed--compilation aborted at /home/davi_marcon/miniconda3/envs/mtbseq-104/share/mtbseq-1.0.4-2/lib/TBbwa.pm line 8.
Compilation failed in require at /home/davi_marcon/miniconda3/envs/mtbseq-104/bin/MTBseq line 10.
BEGIN failed--compilation aborted at /home/davi_marcon/miniconda3/envs/mtbseq-104/bin/MTBseq line 10.
abhi18av commented 2 years ago

Okay thanks for the confirmation Davi, is it working fine with mtbseq-1.0.3 ?

Mxrcon commented 2 years ago

@abhi18av, yes! using the version 1.0.3 it works with no problems when using gatk3-register

abhi18av commented 2 years ago

Okay, then please elaborate step-by-step how you tested with mtbseq-nf:v1.0.3 version.

I just want to confirm whether the default conda setup for that version works with gatk-register (since this is what is available in corresponding biocontainer)

Happy to connect tonight if this is not clear.

Mxrcon commented 2 years ago

@abhi18av, Here's how i setup the testings, You can change the mtbseq=={version} for matching any one interesting

version 1.0.3:

mamba create -n mtbseq_test -c bioconda mtbseq==1.0.3
mamba activate mtbseq_test
gatk3-register GenomeAnalysisTK.jar
MTBseq --check

Output:

<INFO>  [2022-04-13 17:38:22]   Found perl module: MCE
<INFO>  [2022-04-13 17:38:22]   Found perl module: Statistics::Basic
<INFO>  [2022-04-13 17:38:22]   Found bwa in your PATH!
<INFO>  [2022-04-13 17:38:22]   Found samtools in your PATH!
<ERROR> [2022-04-13 17:38:22]   gatk is not installed or not in your PATH!

version 1.0.4:

mamba create -n mtbseq_test -c bioconda mtbseq==1.0.4
mamba activate mtbseq_test
gatk3-register GenomeAnalysisTK.jar
MTBseq --check

output:

Can't locate Statistics/Basic.pm in @INC (you may need to install the Statistics::Basic module) (@INC contains: /home/davi_marcon/miniconda3/envs/mtbseq_test/share/mtbseq-1.0.4-2/lib /home/davi_marcon/miniconda3/envs/mtbseq_test/lib/perl5/5.32/site_perl /home/davi_marcon/miniconda3/envs/mtbseq_test/lib/perl5/site_perl /home/davi_marcon/miniconda3/envs/mtbseq_test/lib/perl5/5.32/vendor_perl /home/davi_marcon/miniconda3/envs/mtbseq_test/lib/perl5/vendor_perl /home/davi_marcon/miniconda3/envs/mtbseq_test/lib/perl5/5.32/core_perl /home/davi_marcon/miniconda3/envs/mtbseq_test/lib/perl5/core_perl .) at /home/davi_marcon/miniconda3/envs/mtbseq_test/share/mtbseq-1.0.4-2/lib/TBtools.pm line 9.
BEGIN failed--compilation aborted at /home/davi_marcon/miniconda3/envs/mtbseq_test/share/mtbseq-1.0.4-2/lib/TBtools.pm line 9.
Compilation failed in require at /home/davi_marcon/miniconda3/envs/mtbseq_test/share/mtbseq-1.0.4-2/lib/TBbwa.pm line 8.
BEGIN failed--compilation aborted at /home/davi_marcon/miniconda3/envs/mtbseq_test/share/mtbseq-1.0.4-2/lib/TBbwa.pm line 8.
Compilation failed in require at /home/davi_marcon/miniconda3/envs/mtbseq_test/bin/MTBseq line 10.
BEGIN failed--compilation aborted at /home/davi_marcon/miniconda3/envs/mtbseq_test/bin/MTBseq line 10.

Very odd, both of them return errors :sob:

version 1.0.3 + gatk:

mamba create -n mtbseq_test -c bioconda mtbseq==1.0.4 gatk==3.8
mamba activate mtbseq_test
gatk3-register GenomeAnalysisTK.jar
MTBseq --check

output:

<INFO>  [2022-04-13 17:55:59]   Found perl module: MCE
<INFO>  [2022-04-13 17:55:59]   Found perl module: Statistics::Basic
<INFO>  [2022-04-13 17:55:59]   Found bwa in your PATH!
<INFO>  [2022-04-13 17:55:59]   Found samtools in your PATH!
<ERROR> [2022-04-13 17:55:59]   gatk is not installed or not in your PATH! 

LATEST WORKING SOLUTION: version 1.0.3 + gatk:

mamba create -n mtbseq_test -c bioconda mtbseq=1.0.3=pl526_1 gatk=3.8=py36_0
mamba activate mtbseq_test
gatk-register GenomeAnalysisTK.jar
MTBseq --check

output:

<INFO>  [2022-04-13 18:05:27]   Found perl module: MCE
<INFO>  [2022-04-13 18:05:27]   Found perl module: Statistics::Basic
<INFO>  [2022-04-13 18:05:27]   Found bwa in your PATH!
<INFO>  [2022-04-13 18:05:27]   Found samtools in your PATH!
<INFO>  [2022-04-13 18:05:27]   Found gatk in your PATH!
<INFO>  [2022-04-13 18:05:27]   Found picard in your PATH!
abhi18av commented 2 years ago

This sounds good Davi, it seems that we've both been able to test the conda based setup independently for mtbseq-1.0.3 but NOT for 1.0.4 (see https://github.com/mtb-bioinformatics/mtbseq-nf/pull/58#pullrequestreview-934815468 , https://github.com/mtb-bioinformatics/mtbseq-nf/issues/59 and https://github.com/mtb-bioinformatics/mtbseq-nf/pull/58#issuecomment-1098466540 )

Adding here the final working solution ✅

mamba create -n mtbseq_test -c bioconda mtbseq=1.0.3=pl526_1 gatk=3.8=py36_0
mamba activate mtbseq_test
gatk-register GenomeAnalysisTK.jar
MTBseq --check
abhi18av commented 2 years ago

This PR will be addressed after the QC workflow has been confirmed to be working.

Mxrcon commented 2 years ago

On my last review around our overall progress on this task, I'd say that this PR is very close to #74, and the overall design of the custom docker container and the conda_env will take care of this problem.

Before I start to tweak those changes, I'd like to hear your thoughts @abhi18av :

What do you think about this:

  1. If were using docker based profile, well keep using the docker image resulting from the build.sh which should have the gatk.jar inside it as well provide a download instruction into the dockerfile.

  2. The other problem would be the gatkjar register when using conda, for this, I'd like to suggest another change: Add a wget instruction and a gatkregister on setup_conda_envs.sh script.

This way, well have both build.sh for docker and setup_conda_envs.sh for conda, setting up the gatkjar on mtbseq

On both changes, we'll need to update the installation instructions so the user will need to pick one of the following:

  1. run bash container/build.sh
  2. run bash conda_envs/setup_conda_envs.sh

Perhaps, we should publish the resulting image so we can use container directive easily.

abhi18av commented 2 years ago

@Mxrcon , sure - it makes sense to me 👍 . Feel free to take these forward in #74 directly to avoid merge conflicts and accidental code mess-ups.

abhi18av commented 1 year ago

Closing this PR due to the decision to improve the GATK jar using #74