🗑️ This dev branch should be deleted after merging to main.
:brain: Aim, Context and Functionality
In SRA-Lite format, reject reads have a set quality encoding of 3, which is represented by the '$' character. This was not taken into consideration by our previous attempt to auto-detect this format by checking the quality-encoding range.
This is a fix to also report SRA-Lite when just the ? or $ characters are detected in the first line of quality-encoding characters.
Note: If the read quality only contains ? and $ characters, it will be reported as SRA-Lite. Given that one encodes for Q-30 and the other for Q-3, it is extremely unlikely that this would occur naturally.
:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made
This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : No
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No
:clipboard: Workflow/Task Step Changes
🔄 Data Processing
Docker/software or software versions changed: N/A
Databases or database versions changed: N/A
Data processing/commands changed: N/A
File processing changed: N/A
Compute resources changed: N/A
➡️ Inputs
Nothing changed
⬅️ Outputs
Nothing changed
:test_tube: Testing
Test Dataset
Commandline Testing with MiniWDL or Cromwell (optional)
Sample with low-quality reads that fail the SRA-Lite quality threshold. Forced SRA-Lite download with "--sra-lite --provider sra --only-provider" input options.
Samples with Q30, Q3 and normal quality encoding
To "force" the SRA-Lite format download, use "--sra-lite --provider sra --only-provider" as fastq_dl_opts input argument on SRA_Fetch.
Theiagen Version Release Testing (optional)
:microscope: Final Developer Checklist
[x] The workflow/task has been tested locally and results, including file contents, are as anticipated
[x] The workflow/task has been tested on Terra and results, including file contents, are as anticipated
[x] The CI/CD has been adjusted and tests are passing (to be completed by Theiagen developer)
This PR closes #480
🗑️ This dev branch should be deleted after merging to main.
:brain: Aim, Context and Functionality
In SRA-Lite format, reject reads have a set quality encoding of 3, which is represented by the '$' character. This was not taken into consideration by our previous attempt to auto-detect this format by checking the quality-encoding range.
This is a fix to also report SRA-Lite when just the
?
or$
characters are detected in the first line of quality-encoding characters.Note: If the read quality only contains
?
and$
characters, it will be reported as SRA-Lite. Given that one encodes for Q-30 and the other for Q-3, it is extremely unlikely that this would occur naturally.:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made
This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : No
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No
:clipboard: Workflow/Task Step Changes
🔄 Data Processing
Docker/software or software versions changed: N/A
Databases or database versions changed: N/A
Data processing/commands changed: N/A
File processing changed: N/A
Compute resources changed: N/A
➡️ Inputs
Nothing changed
⬅️ Outputs
Nothing changed
:test_tube: Testing
Test Dataset
Commandline Testing with MiniWDL or Cromwell (optional)
The
^[?$]+$
regex was tested individually:Terra Testing
Dataset with known SRA-Lite samples (Q30): https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/f786bc54-6779-41ba-945a-1baddb64c11e
Dataset with known SRA-Lite samples (Q3): https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Mendes_Sandbox/job_history/34def1d1-23b6-4fa4-aaa9-53c1efa6c6e3
"--sra-lite --provider sra --only-provider"
input options.Suggested Scenarios for Reviewer to Test
Samples with Q30, Q3 and normal quality encoding To "force" the SRA-Lite format download, use "--sra-lite --provider sra --only-provider" as fastq_dl_opts input argument on SRA_Fetch.
Theiagen Version Release Testing (optional)
:microscope: Final Developer Checklist
🎯 Reviewer Checklist
🗂️ Associated Documentation (to be completed by Theiagen developer)