sanger-tol / blobtoolkit

Nextflow pipeline for BlobToolKit for Sanger ToL production suite
https://pipelines.tol.sanger.ac.uk/blobtoolkit
MIT License
10 stars 0 forks source link

diamond_blastx subworkflow #30

Closed alxndrdiaz closed 1 year ago

alxndrdiaz commented 1 year ago

PR checklist

This PR is part of the Google Summer of Code 2022 project: Conversion of the BlobToolKit pipeline to Nextflow, GitHub repository: https://github.com/sanger-tol/blobtoolkit

alxndrdiaz commented 1 year ago

Implements diamond_blastx subworkflow, this goes next after busco subworkflow.

  1. Moved all parameters outside from the subworflow scripts.
  2. Fixed wrong variable definitions that are known to cause bugs.
  3. Added parameters required for testing (test.config).
muffato commented 1 year ago

Can you remove the files that will be added through #28 ? e.g. busco_diamond_blastp sub-workflow, the goat module, etc. To make it clear what is specific to this sub-workflow / PR

alxndrdiaz commented 1 year ago

Commits associated to requested changes:

8140830 - alxndrdiaz, 2 hours ago : removed unrelated params
97f0275 - alxndrdiaz, 2 hours ago : input for UNCHUNK_BLASTX is now a tuple [meta, blast]
3e5a7b0 - alxndrdiaz, 2 hours ago : output is now tuple [meta, proteomes]
eaadae9 - alxndrdiaz, 2 hours ago : DIAMOND_BLASTX input is now tuple from CHUNK_FASTA_BUSCO
a4c02ce - alxndrdiaz, 2 hours ago : output is now tuple [meta, chunks]
c5b31a8 - alxndrdiaz, 2 hours ago : input is now tuple [meta, raw_proteomes]
a518519 - alxndrdiaz, 2 hours ago : CHUNK_FASTA_BUSCO input is now a tuple [meta, fasta]
7f01fc7 - alxndrdiaz, 2 hours ago : updated input to tuple [meta, fasta]
72a20d7 - alxndrdiaz, 2 hours ago : removed DIAMOND_BLASTX parameters, declared in params
bc963d8 - alxndrdiaz, 3 hours ago : added container version
0f337bd - alxndrdiaz, 3 hours ago : updated versions
c33b23e - alxndrdiaz, 3 hours ago : moved module params to modules.config
35b862e - alxndrdiaz, 3 hours ago : removed unrelated files from busco_subworkflow
priyanka-surana commented 1 year ago

I don't understand why the base is busco_subworkflow, this will be added to dev eventually, once the busco subworkflow is merged.

muffato commented 1 year ago

@priyanka-surana : this was my suggestion on the basis that this diamondblastx branch was started off the busco branch, so that github would only show the difference. But now that this branch has commits to remove the busco files, I think it's not necessary any more. I'll switch back to dev

muffato commented 1 year ago

A bit cleaner, but it still shows all the automate-io files being removed. @alxndrdiaz : commits that remove files are not great because git really thinks you want these files removed. Could you maybe find a way of removing those commits in a rebase ?

muffato commented 1 year ago

@alxndrdiaz : now that #37 is merged, this sub-workflow is next :) Since the branch goes quite far back, maybe make some fresh commits from dev ?

alxndrdiaz commented 1 year ago

@alxndrdiaz : now that #37 is merged, this sub-workflow is next :) Since the branch goes quite far back, maybe make some fresh commits from dev ?

Agree, it would be better to create a new branch from dev.

priyanka-surana commented 1 year ago

This is an optional subworkflow. Would be better to focus on the required subworkflows first. We can circle back to this at a later point.

priyanka-surana commented 1 year ago

I would recommend the blobtools subworkflow with the run_blobtools_create and add_summary_to_metadata modules. I think it might be possible to start with add_summary_to_metadata module, since the run_blobtools_create requires window_stats output which @zb32 is still working on.