jdidion / atropos

An NGS read trimming tool that is specific, sensitive, and speedy. (production)
Other
119 stars 15 forks source link

--output-format sam inserts extra header lines every batch-size output lines ?! #101

Closed plijnzaad closed 4 years ago

plijnzaad commented 4 years ago

If I run atropos 1.1.27 as follows:

atropos -a $adaptor  --batch-size 5 \
  --single-input input.sam \
  --output-format sam  \
  > out.sam

the output looks like this:

@HD     VN:1.6  SO:unsorted
NB500901:56:HM7TNBGX7:1:11101:10000:15401       0       *       0       0       *       *       0       0       CCTTCCTGAAGAAATTGGTACTCTGGAGAACCTAGAAGAACTGTATTTGA
NB500901:56:HM7TNBGX7:1:11101:10000:16810       0       *       0       0       *       *       0       0       TTGCTGTATTTATTAATTTTCTTAAAGTGAAATCTGAAAAAAAAAAAAAA
NB500901:56:HM7TNBGX7:1:11101:10000:18193       0       *       0       0       *       *       0       0       TGCTGGAAAGTTGAAGAGATCTAGAGTAGCCTGCTCGCATGAGGCTTCCA
NB500901:56:HM7TNBGX7:1:11101:10000:6828        0       *       0       0       *       *       0       0       GTCTCTGTCCAGAAAACGTAATGAGGATGAAGATTCACCAAATAAGCTAT
NB500901:56:HM7TNBGX7:1:11101:10001:19952       0       *       0       0       *       *       0       0       GTCAGAGAAGCCTAAATAAATTAGGTCAGTCATGGAGGCCATGGAATGAG
@HD     VN:1.6  SO:unsorted
NB500901:56:HM7TNBGX7:1:11101:10003:17840       0       *       0       0       *       *       0       0       CCGTTCTATTAAAAACAAGATAAAAAAGCTGCCAAGATTTTTCGCGAGTC
NB500901:56:HM7TNBGX7:1:11101:10003:5984        0       *       0       0       *       *       0       0       TCCCCCTACACTTATCATCTTCACAATTCTAATTCTACTGACTATCCTAG
NB500901:56:HM7TNBGX7:1:11101:10004:13337       0       *       0       0       *       *       0       0       CGGCAAAAGAGGATGTAGCCTCTGGGAAAAAACAAACATGTTACAGTGTT
NB500901:56:HM7TNBGX7:1:11101:10004:20339       0       *       0       0       *       *       0       0       TGCTCTGGTGGCTGGAATTGACCGCTACCCCCGCAAAGTGACAGCTGCCA
NB500901:56:HM7TNBGX7:1:11101:10005:15076       0       *       0       0       *       *       0       0       CCTGCTCCGTCTTGTTAACTTGTCATATCGCGCACGTAGTAGCCTAGAGC
@HD     VN:1.6  SO:unsorted
NB500901:56:HM7TNBGX7:1:11101:10006:19599       0       *       0       0       *       *       0       0       CCTCCAGTCCTCCCCATCATTGGTTTTTTTTTTTTTTTATCAACTGTACC
NB500901:56:HM7TNBGX7:1:11101:10007:14213       0       *       0       0       *       *       0       0       CTGCCTAGCTGGATTGCAGAGTTAAGTTTATGATTATGAAATAAAAACTA
NB500901:56:HM7TNBGX7:1:11101:10007:6693        0       *       0       0       *       *       0       0       TTACACAGAATTATCAATCAAGCTCCCCGAGGAGCGGACTTGTAAGGACC
NB500901:56:HM7TNBGX7:1:11101:10008:5314        0       *       0       0       *       *       0       0       CCTCCATGCTTGTGAACTGCACAACTTGAGCCTGACTGTACATCTCTTGG
NB500901:56:HM7TNBGX7:1:11101:10009:17918       0       *       0       0       *       *       0       0       CCTCACTATTGATTTGTCCCAGAATTTTCTGGCCTTTCATGGCAATGAAA
@HD     VN:1.6  SO:unsorted
NB500901:56:HM7TNBGX7:1:11101:10009:19553       0       *       0       0       *       *       0       0       CCGCCGGCGTCCCTTTCTCCATAAAATTCTTCTTAGTAGCTATCACCTTC
....

I.e. there's an extra @HD line every batch-size lines. This does not occur with fastq output , of course. I'll work around it with grep -v ^HD for now.

plijnzaad commented 4 years ago

PS: if I run with --debug, the program produces way too much output (and not i SAM format). If I run with --log-level TRACE, nothing much occurs but I include the log file for completeness' sake: out.log

jdidion commented 4 years ago

Weird - I'll look into it. Thanks for reporting!

jdidion commented 4 years ago

Just clarifying that this was actually with atropos-2.0.0-alpha5

jdidion commented 4 years ago

Fixed in develop - will be in the alpha6 release.