Closed cimendes closed 5 months ago
@cimendes yes, if you can make the mem
memory and len
length, that'd be great.
@cimendes yes, if you can make the
mem
memory andlen
length, that'd be great.
Done! Thank you!
TODO: merge main in!
Since this is a new tool added to TheiaMeta - will you add this to the workflow diagram and documentation? I think that may be the last thing needed before merging the PR. Please let me know when you're ready to merge and I can hit the button
Since this is a new tool added to TheiaMeta - will you add this to the workflow diagram and documentation? I think that may be the last thing needed before merging the PR. Please let me know when you're ready to merge and I can hit the button
yes!! I was waiting on a semi-approval to get that going :) Will update now
@kapsakcj docs have been updated!
This PR closes #321
🗑️ This dev branch should be deleted after merging with main.
:brain: Aim, Context and Functionality
Binning is the next logical step when it comes to metagenomic analysis through assembly and genomic characterization. It allows us to (ideally) separate the components of a community into their contigs.
Two processes are needed for binning:
Downstream characterization is not yet done in this PR.
:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made
This will affect the behaviour of the workflow(s) even if users don’t change any workflow inputs relative to the last version : No
Running this workflow on different occasions could result in different results, e.g. due to the use of a live database, "latest" docker image, or stochastic data processing : Yes (binning is a stochastic algorithm and variations are expected)
:clipboard: Workflow/Task Step Changes
🔄 Data Processing
An additional step has been introduced in the TheiaMeta workflow. Currently, this is a terminal step. After assembly, the resulting files are used to create a coverage report by mapping the clean reads to them. The resulting bams and the assembly file are binned with SemiBin2 to create possible multiple bin FASTA files.
A check was added in the SemiBin task to skip binning if the number of contigs over the minimum length threshold is less than two. This is to avoid failures with SemiBin software.
Docker/software or software versions changed: N/A
Databases or database versions changed: N/A
Data processing/commands changed: N/A
File processing changed: N/A
Compute resources changed: N/A
➡️ Inputs
New optional inputs:
⬅️ Outputs
New outputs:
:test_tube: Testing
Test Dataset
Locally:
On Terra:
Commandline Testing with MiniWDL or Cromwell (optional)
Semibin task was tested locally, concluding successfully
Terra Testing
https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/25c1eab1-4e9a-4390-b792-fc3e61daf519
Suggested Scenarios for Reviewer to Test
Theiagen Version Release Testing (optional)
:microscope: Final Developer Checklist
🎯 Reviewer Checklist
🗂️ Associated Documentation (to be completed by Theiagen developer)