Closed plijnzaad closed 4 years ago
I work-around would be as follows:
# extract selected tags (could be fewer or more) from the bamfile:
tagsfile=${bamfile/sam/tags}
samtools view $bamfile \
| awk -F "\t" -v OFS="\t" '{print $12,$13,$14,$15,$16,$17,$18 }' > $tagsfile
# run atropos with bam in- and output, adding back the tags from $tagsfile
atropos trim --adapter $adapter $bamfile \
--input-format bam --output-format sam OTHERARGS \
| grep -v '^@' \
| paste -d "\t" - $tagsfile \
| awk -F"\t" -v OFS="\t" '$10 != "" && $10 != "*" ' \
| samtools view -b - > ${bamfile/.bam/-trimmed.bam}
the grep -v
-line is to address issue #101, the second awk
line is to get rid of reads that have been trimmed to length 0.
Fixed in develop. Will be released in alpha6.
There are also new --remove-sam-tags
and --keep-sam-tag
options for filtering out all/specific SAM tags.
When running atropos on (unaligned) SAM input which contains SAM tags and outputting with SAM format, all SAM tags have disappeared ! This is very unfortunate because the only reason, really, to use unaligned SAM instead of fastq is precisely the ability to maintain extra information per read (which is impossible with FASTQ).
The specific use case here is single-cell RNA sequencing where we use the CR:Z CY:Z CB:Z UR:Z UY:Z tags to store cell-of-origin and UMI information.
Can the tag please not be skipped? Many thanks!
Philip