ncbi / sra-human-scrubber

An SRA tool that takes as input local fastq file from a clinical infection sample, identifies and removes any significant human read, and outputs the edited (cleaned) fastq file that can safely be used for SRA submission.
Other
42 stars 5 forks source link

Some lines left without quality scores #3

Closed kevinlibuit closed 3 years ago

kevinlibuit commented 3 years ago

I assume this is the repo for us.gcr.io/ncbi-research-sra-dataload/test-scrub:latest /opt/scrubber/scripts/scrub.sh?

If so, I've been running into an issue in which the output of the scrub.sh command (i.e. {sample}.fastq.clean) has some blank quality lines rendering it incompatible for downstream analysis.

E.g.

@sample_01.171888` 171888 length=129
GTTGTAGCTTGTCACACCGTTTCTATAGATTAGCTAATGAGTGTGCTCAAGTATTGAGTGAAATGGTCATGTGTGGCGGTTCACTATATGTTAAAACAGGTGGAACCTCATCAGGAGATGCCACAACTG
+sample_01.171888 171888 length=129
AABAAFFFFFFFCGGGGGGGEGHHFHHHHHHHHHDHHHHHFGHHHFHHHGHFGHHFHHHHGHGGHHHHFHHHHHHEHGGAE1BF5555FG5GEFF4FE2G@FF1@3?23@@@F3@3?FCF3GHHFH/3?
@sample_01.171889 171889 length=160
CATAGATGCCTTCAAACTCAACATTAAATTGTTGGGTGTTGGTGGAAAACCTTGTATCAAAGTAGCCACTGTACAGTCTAAAATGTCAGATGTAAAGTGCACATCAGTAGTCTTACTCTCAGTTTTGCAACAACTCAGAGTAGAATCATCATCTAAATTG
+sample_01.171889 171889 length=160

@sample_01.171890 171890 length=200
ATCTTCAGTTCATCACCAATTATAGGATATTCAATAGTCCAGTCAACACGCTTAACAAAGCACTCGTGGACAGCTAGACACCTAGTCATGATTGCATCACAACTAGCTACATGTGCATTACCATGGACTTGACAATACAGATCATGGTTGCTTTGTAGGTTACCTGTAAAACCCCATTGTTGAACATCAATCATAA
ACGG
+sample_01.171890 171890 length=200
ABCCCFFFFFFFGGGGGGGGGGHHHHHHHHHHHHHHHHHHHHHHHHHHGGGGGGHHHIHHGHHHHGHHGGHHGHHHHHHHHGHHHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHFGGHHHHHHHHHHGHHHHHHHHHHHGGGGHHHHHHHHHHHHHFHHHHHH
HH?D

*edit: formatting

kevinlibuit commented 3 years ago

Issue identified and addressed in latest docker release: ncbi/sra-human-scrubber:1.0.2021-04-19