Closed cimendes closed 4 months ago
Tested on extremely large ONT input fastq (11.5 GB). Sample failed due to insufficient compute resource allocation on the nanoq task. Compute resource (runtime) parameters are not exposed for any of the read_qc_trim tasks. See issue #470
This PR partially closes #301
🗑️ This dev branch should be deleted after merging to main.
:brain: Aim, Context and Functionality
This PR adds a simple fix to kmc over-estimating the genome lengths on ONT data. This tends to happen when the FASTQs are extremely large (over 2GB in size).
To address this a simple catch has been implemented to prevent the estimated genome length to exceed 10M bases (as per #301 direction).
:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made
This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes, kmc genome size is not limited to a maximum of 10M bases
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No
:clipboard: Workflow/Task Step Changes
🔄 Data Processing
Docker/software or software versions changed: None
Databases or database versions changed: None
Data processing/commands changed: A small catch has been implemented to prevent the estimated genome length outputted by kmc from surpassing 10M bases
File processing changed: None
Compute resources changed: None
➡️ Inputs
No outputs have been added
⬅️ Outputs
No outputs have been adjusted
:test_tube: Testing
Test Dataset
Commandline Testing with MiniWDL or Cromwell (optional)
Terra Testing
Suggested Scenarios for Reviewer to Test
Theiagen Version Release Testing (optional)
:microscope: Final Developer Checklist
🎯 Reviewer Checklist
🗂️ Associated Documentation (to be completed by Theiagen developer)