The current implementation of the bam2fasta step does not retain information about whether the read was aligned or not in the read ID for the fasta. This can be found by looking at the aligned/unaligned fastas separately, but I'd like to have that information entirely in the read name and not need to look anywhere else.
What are all the supported flags?
Here are links to all suppported flags and some of the important ones
GN: Semicolon-separated list of gene names that are compatible with this alignment. Gene names are specified with gene_name key in the reference GTF attribute column.
RE: Single character indicating the region type of this alignment (E = exonic, N = intronic, I = intergenic).
The current implementation of the
bam2fasta
step does not retain information about whether the read was aligned or not in the read ID for the fasta. This can be found by looking at the aligned/unaligned fastas separately, but I'd like to have that information entirely in the read name and not need to look anywhere else.What are all the supported flags?
Here are links to all suppported flags and some of the important ones
GN
: Semicolon-separated list of gene names that are compatible with this alignment. Gene names are specified withgene_name
key in the reference GTF attribute column.RE
: Single character indicating the region type of this alignment (E = exonic, N = intronic, I = intergenic).NH
: Number of reported alignments that contain the query in the current recordHI
: Query hit indexAligned, but no gene assigned
GN
: not presentRE
: presentNH
: integer > 0HI
: integer >= 0Aligned, with assigned gene
GN
: presentRE
: presentNH
: integer > 0HI
: integer >= 0Unaligned (thus no gene assigned)
GN
: Not presentRE
: Not presentNH
: integer == 0HI
: integer == 0Thus, this PR adds at least
NH
,HI
, andRE
tags, plus all known tags just in case they're needed for downstream processing.PR checklist
CHANGELOG.md
is updateddocs
is updated